Skip to content

TorchDynamo Performance DashBoard #93794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anijain2305 opened this issue Jul 29, 2022 · 249 comments
Closed

TorchDynamo Performance DashBoard #93794

anijain2305 opened this issue Jul 29, 2022 · 249 comments
Labels
module: dynamo triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@anijain2305
Copy link
Contributor

anijain2305 commented Jul 29, 2022

Dashboard to track the performance of different backends.

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

@anijain2305 anijain2305 changed the title [WIP/TRIAL] Setting up Automatic Benchmarking Results Setting up Automatic Benchmarking Results Aug 9, 2022
@anijain2305 anijain2305 changed the title Setting up Automatic Benchmarking Results TorchDynamo Performance DashBoard Aug 10, 2022
@Chillee Chillee pinned this issue Aug 11, 2022
@anijain2305
Copy link
Contributor Author

Compilation Profile

The tables show the worst 50 models for different metrics

Compilation Latency

see more

dtype=float32, unit=seconds

+-------------+-----------------------------------------+------------+---------+--------+-----------+-------------+---------------------+
|    suite    |                  name                   | batch_size | pytorch | eager  | aot_eager | aot_nvfuser | inductor_cudagraphs |
+-------------+-----------------------------------------+------------+---------+--------+-----------+-------------+---------------------+
| huggingface |          MobileBertForMaskedLM          |     16     |   0.0   | 67.728 |  78.263   |   139.766   |       426.422       |
| huggingface |     MobileBertForQuestionAnswering      |     32     |   0.0   | 66.85  |  78.347   |   138.941   |       521.547       |
| torchbench  |               densenet121               |     4      |   0.0   | 3.646  |   7.496   |   77.281    |       599.583       |
| torchbench  |       mobilenet_v2_quantized_qat        |     96     |   0.0   | 3.431  |   7.875   |   70.466    |        -2.83        |
| torchbench  |            timm_efficientdet            |     1      |   0.0   | 65.366 |  65.409   |   65.612    |       -4.579        |
| timm_models |            res2net50_14w_8s             |    128     |   0.0   | 3.708  |   8.399   |   57.811    |       428.275       |
| timm_models |            res2net101_26w_4s            |     64     |   0.0   |  5.39  |  10.482   |   56.042    |       459.323       |
| timm_models |               res2next50                |    128     |   0.0   | 1.946  |   4.426   |   52.489    |       292.475       |
| timm_models |             legacy_senet154             |     32     |   0.0   | 5.437  |  11.439   |   51.941    |       335.474       |
| torchbench  |           mobilenet_v3_large            |     32     |   0.0   | 0.797  |   1.958   |   48.049    |       325.854       |
| timm_models |           gluon_inception_v3            |    128     |   0.0   | 1.879  |   4.232   |   46.426    |       472.191       |
| timm_models |              inception_v3               |    128     |   0.0   | 1.872  |   4.206   |   46.343    |       476.239       |
| torchbench  |         resnet50_quantized_qat          |     32     |   0.0   | 2.366  |   6.501   |   45.732    |       -2.707        |
| timm_models |            adv_inception_v3             |    128     |   0.0   | 0.523  |   2.818   |   45.161    |       462.487       |
| huggingface |            XLNetLMHeadModel             |     4      |   0.0   | 13.315 |  21.675   |   38.803    |       598.55        |
| huggingface |       MT5ForConditionalGeneration       |     2      |   0.0   | 13.459 |  18.646   |    37.11    |       380.514       |
| timm_models |            gluon_xception65             |     32     |   0.0   | 1.946  |   5.08    |   34.752    |       226.31        |
| torchbench  |              mobilenet_v2               |     96     |   0.0   | 0.542  |   1.57    |   33.984    |       281.45        |
| huggingface |         MegatronBertForCausalLM         |     2      |   0.0   | 17.26  |  21.586   |   33.516    |       459.281       |
| huggingface |    MegatronBertForQuestionAnswering     |     8      |   0.0   | 16.706 |   21.48   |   33.387    |       598.832       |
| timm_models |               selecsls42b               |    128     |   0.0   | 0.586  |   1.621   |   31.551    |       239.668       |
| timm_models |              nasnetalarge               |     16     |   0.0   | 30.009 |  31.092   |   31.029    |       -3.241        |
| torchbench  |                resnet50                 |     32     |   0.0   | 0.765  |   1.95    |   29.284    |       159.181       |
| torchbench  |               mnasnet1_0                |     32     |   0.0   | 0.611  |   1.694   |   27.733    |       273.082       |
| huggingface |       T5ForConditionalGeneration        |     4      |   0.0   | 7.773  |  11.156   |   27.079    |       266.568       |
| huggingface |          DebertaV2ForMaskedLM           |     1      |   0.0   | 7.919  |  12.989   |   26.708    |       -1.222        |
| torchbench  |                  hf_T5                  |     8      |   0.0   | 7.205  |  10.728   |    26.68    |       234.116       |
| huggingface |             XGLMForCausalLM             |     2      |   0.0   | 6.336  |  10.992   |   26.274    |       598.983       |
| huggingface |                 T5Small                 |     1      |   0.0   | 7.806  |  11.216   |   26.261    |       280.438       |
| huggingface |      DebertaV2ForQuestionAnswering      |     1      |   0.0   | 7.909  |   12.99   |   25.893    |        -1.24        |
| huggingface |     M2M100ForConditionalGeneration      |     2      |   0.0   | 6.279  |  12.378   |   25.882    |       598.719       |
| torchbench  |             resnext50_32x4d             |     8      |   0.0   | 0.773  |   1.949   |   25.454    |       141.483       |
| huggingface |     PegasusForConditionalGeneration     |     4      |   0.0   | 6.235  |  11.986   |   24.865    |       582.318       |
| timm_models |              pnasnet5large              |     16     |   0.0   | 22.702 |  24.632   |   23.986    |       -3.202        |
| huggingface |            YituTechConvBert             |     1      |   0.0   | 7.199  |   10.91   |   22.998    |       338.889       |
| torchbench  |             LearningToPaint             |     96     |   0.0   | 0.419  |   0.85    |   22.746    |       107.36        |
| huggingface |     GPTNeoForSequenceClassification     |     1      |   0.0   | 7.333  |  12.292   |   21.577    |       -1.179        |
| torchbench  |           shufflenet_v2_x1_0            |    128     |   0.0   | 0.885  |   2.245   |    21.15    |       190.619       |
| huggingface |            GPTNeoForCausalLM            |     1      |   0.0   | 7.335  |  12.174   |   21.138    |        -1.16        |
| torchbench  |               hf_BigBird                |     2      |   0.0   | 8.239  |  12.636   |   20.964    |       -1.487        |
| huggingface |                 BigBird                 |     1      |   0.0   | 8.339  |  12.634   |   20.934    |       -1.432        |
| huggingface | BlenderbotSmallForConditionalGeneration |     64     |   0.0   | 5.518  |   9.435   |    20.71    |       274.667       |
| timm_models |                hrnet_w18                |    128     |   0.0   | 18.748 |  20.474   |   20.252    |       -3.612        |
| huggingface |           DebertaForMaskedLM            |     4      |   0.0   | 4.385  |   7.539   |   19.249    |       172.45        |
| torchbench  |           Background_Matting            |     4      |   0.0   | 0.025  |   0.986   |   18.871    |       135.819       |
| huggingface |       DebertaForQuestionAnswering       |     4      |   0.0   | 4.343  |   7.558   |   18.614    |       -1.077        |
| timm_models |                 dpn107                  |     32     |   0.0   | 17.261 |  17.818   |   17.805    |       -2.842        |
| huggingface |           ElectraForCausalLM            |     1      |   0.0   | 5.238  |   7.486   |   17.657    |       276.946       |
| huggingface |                CamemBert                |     1      |   0.0   | 5.146  |   7.451   |   17.356    |       -0.978        |
| huggingface |           LayoutLMForMaskedLM           |     16     |   0.0   | 5.342  |   7.731   |    17.14    |       214.194       |
+-------------+-----------------------------------------+------------+---------+--------+-----------+-------------+---------------------+

Peak Memory

see more

dtype=float32, unit=GB

+-------------+-----------------------------------------+------------+---------+--------+-----------+-------------+---------------------+
|    suite    |                  name                   | batch_size | pytorch | eager  | aot_eager | aot_nvfuser | inductor_cudagraphs |
+-------------+-----------------------------------------+------------+---------+--------+-----------+-------------+---------------------+
| torchbench  |                  vgg16                  |     64     |   0.0   |  0.0   |   3.148   |    3.147    |        1.005        |
| torchbench  |                  hf_T5                  |     8      |   0.0   |  0.0   |   1.749   |    2.566    |        3.397        |
| timm_models |               res2next50                |    128     |   0.0   |  0.0   |   1.415   |    2.101    |        5.326        |
| timm_models |            res2net50_14w_8s             |    128     |   0.0   |  0.0   |   1.572   |    2.036    |        4.705        |
| huggingface |       BlenderbotSmallForCausalLM        |     64     |   0.0   |  0.0   |   1.916   |    1.92     |        4.27         |
| huggingface |            AlbertForMaskedLM            |     2      |   0.0   |  0.0   |   0.954   |    1.844    |        1.231        |
| timm_models |           gluon_inception_v3            |    128     |   0.0   |  0.0   |   2.006   |    1.816    |        2.53         |
| timm_models |            adv_inception_v3             |    128     |   0.0   |  0.0   |   2.006   |    1.816    |        2.528        |
| timm_models |              inception_v3               |    128     |   0.0   |  0.0   |   2.006   |    1.816    |        2.529        |
| huggingface | BlenderbotSmallForConditionalGeneration |     64     |   0.0   |  0.0   |   1.664   |    1.668    |        4.141        |
| huggingface |       AlbertForQuestionAnswering        |     2      |   0.0   |  0.0   |   0.705   |    1.595    |        0.697        |
| timm_models |            gluon_xception65             |     32     |   0.0   |  0.0   |   0.908   |    1.546    |        0.327        |
| huggingface |            XLNetLMHeadModel             |     4      |   0.0   |  0.0   |   1.514   |    1.531    |       -10.373       |
| torchbench  |                hf_Albert                |     8      |   0.0   |  0.0   |   0.356   |    1.459    |       -0.749        |
| huggingface |             BartForCausalLM             |     4      |   0.0   |  0.0   |   1.227   |    1.244    |        4.418        |
| timm_models |            res2net101_26w_4s            |     64     |   0.0   |  0.0   |   0.848   |    1.111    |        2.468        |
| timm_models |             legacy_senet154             |     32     |   0.0   |  0.0   |   0.989   |    1.106    |        0.095        |
| torchbench  |                 hf_Bart                 |     4      |   0.0   |  -0.0  |   1.026   |    1.035    |        1.541        |
| huggingface |           LayoutLMForMaskedLM           |     16     |   0.0   |  0.0   |    1.0    |     1.0     |        2.144        |
| huggingface |             BertForMaskedLM             |     64     |   0.0   |  0.0   |    1.0    |    0.975    |        2.107        |
| huggingface |       T5ForConditionalGeneration        |     4      |   0.0   |  0.0   |   0.736   |    0.944    |        2.519        |
| torchbench  |               timm_nfnet                |    128     |   0.0   | 0.891  |   0.89    |    0.89     |       -13.257       |
| timm_models |               dm_nfnet_f0               |    128     |   0.0   | 0.891  |   0.89    |    0.89     |       -13.257       |
| torchbench  |           Background_Matting            |     4      |   0.0   | -0.03  |   0.586   |    0.865    |        0.999        |
| huggingface |            MBartForCausalLM             |     16     |   0.0   |  0.0   |   0.819   |    0.82     |        3.195        |
| huggingface |    MegatronBertForQuestionAnswering     |     8      |   0.0   |  0.0   |   0.797   |    0.797    |       -3.993        |
| huggingface |            TrOCRForCausalLM             |     8      |   0.0   |  0.0   |   0.75    |    0.75     |        2.531        |
| torchbench  |             pytorch_struct              |    200     |   0.0   |  0.0   |   0.682   |    0.682    |        0.05         |
| torchbench  |                resnet50                 |     32     |   0.0   |  0.0   |   0.438   |    0.673    |        1.107        |
| huggingface |    LayoutLMForSequenceClassification    |     16     |   0.0   | 0.025  |   0.885   |    0.658    |        0.847        |
| timm_models |               selecsls42b               |    128     |   0.0   | 0.076  |   0.695   |    0.649    |        1.965        |
| torchbench  |              pytorch_unet               |     1      |   0.0   |  -0.0  |   0.623   |    0.567    |        0.667        |
| huggingface |       MT5ForConditionalGeneration       |     2      |   0.0   |  0.0   |   0.622   |    0.536    |        3.445        |
| huggingface |                 T5Small                 |     1      |   0.0   |  0.0   |   0.372   |    0.532    |        1.144        |
| huggingface |     MobileBertForQuestionAnswering      |     32     |   0.0   |  0.0   |   0.084   |    0.502    |        0.78         |
| huggingface |            PLBartForCausalLM            |     16     |   0.0   |  0.0   |   0.485   |    0.486    |        1.604        |
| huggingface |       ElectraForQuestionAnswering       |     64     |   0.0   |  0.0   |   0.716   |    0.448    |       -0.436        |
| torchbench  |                 hf_Bert                 |     4      |   0.0   |  0.0   |   0.496   |    0.447    |        1.195        |
| huggingface |                CamemBert                |     1      |   0.0   | -0.003 |   0.445   |    0.447    |       -1.415        |
| huggingface |       RobertaForQuestionAnswering       |     64     |   0.0   |  0.0   |   0.444   |    0.443    |        0.78         |
| huggingface |        BertForQuestionAnswering         |     64     |   0.0   |  0.0   |   0.444   |    0.443    |        0.779        |
| huggingface |         Speech2Text2ForCausalLM         |     64     |   0.0   | 0.101  |   0.428   |    0.433    |        1.004        |
| torchbench  |             LearningToPaint             |     96     |   0.0   | 0.021  |   0.358   |    0.401    |        0.54         |
| huggingface |            YituTechConvBert             |     1      |   0.0   |  0.0   |   0.382   |    0.39     |        1.458        |
| torchbench  |              hf_DistilBert              |     8      |   0.0   |  0.0   |   0.484   |    0.373    |        0.943        |
| torchbench  |           shufflenet_v2_x1_0            |    128     |   0.0   |  0.0   |   0.266   |    0.37     |        0.378        |
| huggingface |          MobileBertForMaskedLM          |     16     |   0.0   |  0.0   |   0.25    |    0.352    |        0.97         |
| torchbench  |               mnasnet1_0                |     32     |   0.0   |  0.0   |   0.149   |     0.3     |        0.358        |
| huggingface |               DistillGPT2               |     1      |   0.0   | 0.003  |   0.408   |    0.29     |        1.164        |
| timm_models |            convmixer_768_32             |     32     |   0.0   |  0.0   |   0.179   |    0.265    |        0.154        |
+-------------+-----------------------------------------+------------+---------+--------+-----------+-------------+---------------------+

Number of graphs

see more

dtype=float32, unit=graphs

+-------------+-----------------------------------+------------+--------+
|    suite    |               name                | batch_size | graphs |
+-------------+-----------------------------------+------------+--------+
| huggingface |       DebertaV2ForMaskedLM        |     1      | 304.0  |
| huggingface |   DebertaV2ForQuestionAnswering   |     1      | 304.0  |
| huggingface |        DebertaForMaskedLM         |     4      | 204.0  |
| huggingface |    DebertaForQuestionAnswering    |     4      | 204.0  |
| huggingface |              BigBird              |     1      |  64.0  |
| torchbench  |            hf_BigBird             |     2      |  64.0  |
| timm_models |            convit_base            |     32     |  27.0  |
| huggingface |            GoogleFnet             |     1      |  27.0  |
| torchbench  |            hf_Reformer            |     4      |  22.0  |
| timm_models |            densenet121            |     64     |  14.0  |
| torchbench  |               moco                |     32     |  11.0  |
| huggingface |  PegasusForConditionalGeneration  |     4      |  7.0   |
| torchbench  |           fastNLP_Bert            |     6      |  10.0  |
| huggingface |  M2M100ForConditionalGeneration   |     2      |  7.0   |
| torchbench  |            tts_angular            |     64     |  4.0   |
| torchbench  |        speech_transformer         |     32     |  4.0   |
| huggingface |      Speech2Text2ForCausalLM      |     64     |  4.0   |
| huggingface |          XGLMForCausalLM          |     2      |  4.0   |
| huggingface |        PegasusForCausalLM         |     8      |  4.0   |
| timm_models |          crossvit_9_240           |     64     |  2.0   |
| timm_models |        eca_botnext26ts_256        |    128     |  2.0   |
| timm_models |         gluon_xception65          |     32     |  2.0   |
| timm_models |          gluon_senet154           |     32     |  2.0   |
| timm_models |        gluon_inception_v3         |    128     |  2.0   |
| timm_models |           ghostnet_100            |    128     |  2.0   |
| timm_models |             gernet_l              |    128     |  2.0   |
| timm_models |             fbnetv3_b             |    128     |  2.0   |
| timm_models |            fbnetc_100             |    128     |  2.0   |
| timm_models |         ese_vovnet19b_dw          |    128     |  2.0   |
| timm_models |         adv_inception_v3          |    128     |  2.0   |
| timm_models |           ecaresnet101d           |     64     |  2.0   |
| timm_models |         eca_halonext26ts          |    128     |  2.0   |
| timm_models |       beit_base_patch16_224       |     64     |  2.0   |
| huggingface | LayoutLMForSequenceClassification |     16     |  2.0   |
| timm_models |           botnet26t_256           |    128     |  2.0   |
| timm_models |           cait_m36_384            |     2      |  2.0   |
| timm_models |          coat_lite_mini           |    128     |  2.0   |
| timm_models |              dpn107               |     32     |  2.0   |
| timm_models |            dm_nfnet_f0            |    128     |  2.0   |
| timm_models |         convmixer_768_32          |     32     |  2.0   |
| timm_models |              dla102               |     64     |  2.0   |
| timm_models |           gmlp_s16_224            |     64     |  2.0   |
| timm_models |  deit_base_distilled_patch16_224  |     64     |  2.0   |
| huggingface |   GPT2ForSequenceClassification   |     4      |  2.0   |
| timm_models |           cspdarknet53            |     64     |  2.0   |
| huggingface |  GPTNeoForSequenceClassification  |     1      |  2.0   |
| timm_models |           convnext_base           |     32     |  2.0   |
| timm_models |           gmixer_24_224           |     64     |  2.0   |
| timm_models |       xcit_large_24_p8_224        |     5      |  2.0   |
| timm_models |            res2next50             |    128     |  2.0   |
+-------------+-----------------------------------+------------+--------+

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------------+-------------+-------------+-------------+
|    Compiler    | torchbench  | huggingface | timm_models |
+----------------+-------------+-------------+-------------+
|     eager      | 100%, 55/55 | 93%, 41/44  | 100%, 61/61 |
|   aot_eager    | 98%, 54/55  | 93%, 41/44  | 90%, 55/61  |
| aot_cudagraphs | 29%, 16/55  |  0%, 0/44   |  0%, 0/61   |
|  aot_nvfuser   | 62%, 34/55  |  2%, 1/44   | 82%, 50/61  |
|    inductor    | 87%, 48/55  | 77%, 34/44  | 74%, 45/61  |
+----------------+-------------+-------------+-------------+

Geometric mean speedup

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   1.00x    |    1.01x    |    1.00x    |
|   aot_eager    |   1.01x    |    1.00x    |    1.00x    |
| aot_cudagraphs |   1.02x    |    0.0x     |    0.0x     |
|  aot_nvfuser   |   1.12x    |    1.12x    |    1.12x    |
|    inductor    |   1.38x    |    1.60x    |    1.23x    |
+----------------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |    5.68    |    13.69    |    11.39    |
|   aot_eager    |   10.31    |    20.58    |    17.02    |
| aot_cudagraphs |    4.47    |     0.0     |     0.0     |
|  aot_nvfuser   |   21.51    |    10.59    |    57.77    |
|    inductor    |   278.25   |   120.52    |   427.42    |
+----------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   0.96x    |    0.98x    |    1.00x    |
|   aot_eager    |   0.87x    |    0.88x    |    0.88x    |
| aot_cudagraphs |   0.48x    |    0.0x     |    0.0x     |
|  aot_nvfuser   |   0.84x    |    1.08x    |    0.85x    |
|    inductor    |   0.79x    |    0.74x    |    0.90x    |
+----------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            densenet121            |  4   | 0.9976 |  1.0092   |      0.0       |   1.4538    |  4.5603  |
|         timm_efficientdet         |  1   | 0.9817 |  0.8908   |      0.0       |     0.0     |  3.8319  |
|       functorch_dp_cifar10        |  64  | 1.0004 |  0.9835   |      0.0       |   1.2001    |  3.7742  |
|      timm_vision_transformer      |  8   | 0.9983 |  0.9452   |      0.0       |   1.3452    |  2.5363  |
|                drq                |  1   | 1.0117 |   0.826   |      0.0       |   1.0725    |  2.4186  |
|           BERT_pytorch            |  16  | 1.0094 |  0.8856   |      0.0       |     0.0     |   2.03   |
|             resnet18              |  16  | 1.0049 |  1.1155   |      0.0       |   1.3986    |  1.7819  |
|          pytorch_struct           | 200  | 0.9963 |  0.7395   |     0.8854     |   0.8963    |  1.7657  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9955 |  0.9348   |     1.1291     |   1.1909    |  1.7242  |
|           lennard_jones           | 1000 | 0.974  |  0.8405   |     1.0627     |   1.0207    |  1.7135  |
|             hf_Albert             |  8   | 1.0013 |  0.9978   |      0.0       |     0.0     |  1.6628  |
|           squeezenet1_1           |  32  | 1.0006 |  1.0042   |     1.0435     |   1.1661    |  1.6351  |
|               dcgan               |  32  | 0.9954 |   1.02    |     1.088      |   1.1569    |  1.6229  |
|          resnext50_32x4d          |  8   | 1.0027 |  1.0793   |      0.0       |   1.3534    |  1.5568  |
|        speech_transformer         |  32  | 1.003  |  0.8984   |      0.0       |     0.0     |  1.4906  |
|            timm_nfnet             | 128  | 0.9995 |  0.9997   |      0.0       |   1.2116    |  1.4697  |
|        mobilenet_v3_large         |  32  | 1.0053 |   1.121   |      0.0       |   1.3848    |  1.4662  |
|              hf_GPT2              |  4   | 1.0053 |  0.9748   |      0.0       |     0.0     |  1.4228  |
|            hf_T5_large            |  2   | 1.0242 |  0.8958   |      0.0       |     0.0     |  1.4145  |
|         soft_actor_critic         | 256  | 0.9952 |  0.7978   |     1.0393     |   1.0108    |  1.3816  |
|           fastNLP_Bert            |  6   | 0.999  |  0.9749   |      0.0       |     0.0     |  1.3503  |
|           pytorch_unet            |  1   | 0.9996 |  0.9969   |      0.0       |   1.0758    |  1.2042  |
|          LearningToPaint          |  96  | 1.0045 |  1.0546   |      0.0       |   1.2423    |  1.2032  |
|              hf_Bart              |  4   | 1.0118 |   0.974   |      0.0       |     0.0     |  1.1751  |
|            Super_SloMo            |  6   | 0.9999 |  0.9977   |      0.0       |     0.0     |  1.1742  |
|               vgg16               |  64  |  1.0   |  0.9986   |     0.7923     |   0.9962    |  1.1703  |
|              hf_Bert              |  4   | 1.0269 |  0.9881   |      0.0       |     0.0     |  1.1642  |
|              alexnet              | 128  | 0.9984 |  0.9988   |     0.777      |   1.0007    |  1.162   |
|            mnasnet1_0             |  32  | 1.001  |  1.1017   |     0.7035     |   1.3033    |  1.1612  |
|           hf_DistilBert           |  8   | 0.9997 |  0.9542   |      0.0       |     0.0     |  1.1537  |
|        Background_Matting         |  4   | 0.9996 |  1.0229   |      0.0       |    1.08     |  1.1159  |
|          pytorch_stargan          |  16  | 0.9994 |  0.9836   |     0.7288     |   0.9873    |  1.1151  |
|            hf_Reformer            |  4   | 0.9963 |    0.0    |     0.8939     |     0.0     |  1.1098  |
|            hf_BigBird             |  2   | 0.985  |  0.9444   |      0.0       |     0.0     |  1.0887  |
|        shufflenet_v2_x1_0         | 128  | 1.0011 |  1.0504   |      0.0       |   1.1836    |  1.0756  |
|         timm_efficientnet         |  32  | 0.9543 |   0.816   |      0.0       |   1.0788    |  1.0728  |
|   timm_vision_transformer_large   |  8   | 0.9992 |  0.9936   |      0.0       |   0.9822    |  1.0534  |
| attention_is_all_you_need_pytorch | 256  | 0.9979 |  0.9708   |      0.0       |     0.0     |  1.0469  |
|           timm_resnest            |  32  | 0.9996 |  1.0033   |      0.0       |   1.1829    |  1.0289  |
|            tts_angular            |  64  | 0.9959 |  0.9672   |     0.9836     |   0.9982    |  1.0112  |
|              demucs               |  4   | 1.0003 |  1.0002   |     0.9997     |   1.0006    |   1.0    |
|    mobilenet_v2_quantized_qat     |  96  | 0.9992 |  0.9996   |     0.999      |   0.9988    |  0.999   |
|      resnet50_quantized_qat       |  32  | 0.9975 |  0.9984   |     0.9983     |   0.9987    |  0.9984  |
|               dlrm                | 2048 | 0.9692 |  0.9785   |      0.0       |     0.0     |  0.9604  |
|           mobilenet_v2            |  96  | 0.9993 |  0.9979   |      0.0       |   1.0437    |  0.9574  |
|            timm_vovnet            |  32  | 0.9073 |  0.9025   |      0.0       |   1.0018    |   0.91   |
|      nvidia_deeprecommender       | 256  | 0.9993 |  0.9629   |     0.5845     |   0.9425    |  0.9044  |
|               moco                |  32  | 0.9947 |  1.0484   |      0.0       |     0.0     |  0.7591  |
|            timm_regnet            |  32  | 0.9652 |  0.9636   |      0.0       |   1.0932    |  0.7378  |
|             resnet50              |  32  | 0.9987 |  0.9933   |      0.0       |    1.161    |  0.7127  |
|              yolov3               |  16  | 0.9996 |  0.9945   |      0.0       |   1.1838    |   0.0    |
|           hf_Longformer           |  2   | 0.969  |   0.899   |     0.8164     |     0.0     |   0.0    |
|               hf_T5               |  8   | 0.9985 |  0.9942   |      0.0       |     0.0     |   0.0    |
|           hf_GPT2_large           |  4   | 0.9996 |  0.9801   |      0.0       |     0.0     |   0.0    |
|             tacotron2             |  64  | 0.9791 |  0.8546   |      0.0       |     0.0     |   0.0    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          LearningToPaint          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            densenet121            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|                drq                |  1  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           mobilenet_v2            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           pytorch_unet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet18              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet50              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          resnext50_32x4d          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|         timm_efficientnet         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_regnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           timm_resnest            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|      timm_vision_transformer      |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_vovnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            Super_SloMo            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           fastNLP_Bert            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|             hf_Albert             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bert              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_BigBird             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_DistilBert           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|        Background_Matting         |  4  |       pass       |       pass       |   fail_to_run    |       pass       |   fail_to_run    |
|           hf_Longformer           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|               moco                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|             tacotron2             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|              yolov3               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      0.0000      |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|         timm_efficientdet         |  1   | 50.9164 |  70.3788  |      nan       |     nan     | 1855.9202 |
|            densenet121            |  4   | 13.1067 |  25.4059  |      nan       |  101.5015   | 1599.4226 |
|            hf_T5_large            |  2   | 35.7166 |  66.5562  |      nan       |     nan     | 1154.4563 |
|            mnasnet1_0             |  32  | 3.1383  |  7.0386   |    23.5784     |   33.4187   | 924.0881  |
|        mobilenet_v3_large         |  32  | 3.6197  |   7.569   |      nan       |   55.8228   | 815.6827  |
|               moco                |  32  | 11.4915 |  16.8868  |      nan       |     nan     | 792.5782  |
|           mobilenet_v2            |  96  |  3.069  |  6.6873   |      nan       |   39.0419   | 673.3655  |
|          resnext50_32x4d          |  8   | 3.3393  |  7.3876   |      nan       |   31.0213   | 626.7237  |
|         timm_efficientnet         |  32  | 5.8246  |  10.4379  |      nan       |   56.7643   | 573.6101  |
|        shufflenet_v2_x1_0         | 128  | 3.6097  |  8.0917   |      nan       |   29.4511   | 415.0044  |
|           squeezenet1_1           |  32  | 0.6275  |  1.3124   |     3.1679     |   4.8972    | 379.8186  |
|           timm_resnest            |  32  |  1.351  |  3.4723   |      nan       |   36.2388   | 362.8361  |
|            timm_regnet            |  32  |  8.274  |  14.2127  |      nan       |   53.5289   | 335.4974  |
| attention_is_all_you_need_pytorch | 256  | 4.2332  |  10.1412  |      nan       |     nan     | 269.6108  |
|        speech_transformer         |  32  | 7.1452  |  13.5568  |      nan       |     nan     | 259.7565  |
|            timm_vovnet            |  32  | 2.8909  |  6.1661   |      nan       |   25.6462   | 255.4935  |
|       functorch_dp_cifar10        |  64  | 0.7904  |  2.0933   |      nan       |   5.6355    | 208.4064  |
|      timm_vision_transformer      |  8   | 2.9873  |  6.3471   |      nan       |   11.3264   | 200.3176  |
|             resnet18              |  16  | 0.9362  |  2.4353   |      nan       |   18.0277   | 195.5902  |
|   timm_vision_transformer_large   |  8   | 22.2765 |  34.0841  |      nan       |   44.7332   | 189.7259  |
|        Background_Matting         |  4   | 3.6941  |  7.5331   |      nan       |   32.8015   | 183.8065  |
|           BERT_pytorch            |  16  | 4.8027  |  10.7586  |      nan       |     nan     | 183.4356  |
|          LearningToPaint          |  96  | 0.9741  |  2.5194   |      nan       |   24.5849   | 178.7819  |
|             resnet50              |  32  | 3.2773  |  7.4179   |      nan       |   35.0054   | 175.1635  |
|              hf_Bart              |  4   | 7.0179  |  13.1991  |      nan       |     nan     | 163.5884  |
|           fastNLP_Bert            |  6   | 5.0044  |  9.9284   |      nan       |     nan     | 153.2715  |
|              hf_GPT2              |  4   | 3.4005  |  7.8867   |      nan       |     nan     | 149.3391  |
|            timm_nfnet             | 128  | 6.6204  |  11.8892  |      nan       |   34.5324   | 136.2766  |
|          pytorch_stargan          |  16  | 0.8038  |   2.764   |     9.5008     |   4.2834    |  128.577  |
|          pytorch_struct           | 200  | 0.3903  |  0.9288   |     1.4439     |   4.2379    | 106.0572  |
|            Super_SloMo            |  6   | 2.1703  |  5.8559   |      nan       |     nan     |  93.2015  |
|              hf_Bert              |  4   | 4.9761  |  9.5568   |      nan       |     nan     |  82.4714  |
|             hf_Albert             |  8   | 1.1045  |  5.7238   |      nan       |     nan     |  80.7431  |
|            hf_Reformer            |  4   | 2.9996  |    nan    |    13.0539     |     nan     |  77.5866  |
|           pytorch_unet            |  1   | 1.0533  |   2.812   |      nan       |   20.2914   |  64.8277  |
|            hf_BigBird             |  2   | 10.9112 |  16.7734  |      nan       |     nan     |  61.4215  |
|           hf_DistilBert           |  8   |  1.57   |  3.9675   |      nan       |     nan     |  54.3823  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.7416  |  2.5789   |     7.9558     |   4.1967    |  35.9234  |
|               vgg16               |  64  | 0.3047  |  0.7761   |     2.3197     |   2.7169    |  22.3816  |
|               dlrm                | 2048 | 0.5927  |  0.9744   |      nan       |     nan     |  18.9655  |
|                drq                |  1   | 0.2629  |  0.5431   |      nan       |   3.5281    |  18.6718  |
|              alexnet              | 128  | 0.2274  |  0.5093   |     1.2017     |   2.4621    |  17.8106  |
|               dcgan               |  32  | 0.2487  |  0.5048   |      1.23      |   3.8262    |  16.923   |
|      nvidia_deeprecommender       | 256  | 0.2588  |  0.4766   |     0.7503     |   2.4694    |  12.556   |
|         soft_actor_critic         | 256  | 0.2557  |  0.3887   |     0.5963     |   1.5565    |  12.2199  |
|           lennard_jones           | 1000 | 0.2225  |  0.3672   |     0.5077     |   1.1334    |  5.8665   |
|            tts_angular            |  64  | 0.3106  |  0.3618   |     0.4935     |   1.0926    |   4.687   |
|      resnet50_quantized_qat       |  32  | 2.5256  |  2.4992   |     2.5283     |   2.4885    |   2.434   |
|    mobilenet_v2_quantized_qat     |  96  | 2.4664  |  2.4212   |     2.3653     |   2.3407    |  2.3672   |
|              demucs               |  4   | 0.8026  |  0.8012   |     0.8095     |   0.8135    |  0.7167   |
|              yolov3               |  16  | 7.1951  |  13.031   |      nan       |   47.4947   |    nan    |
|           hf_Longformer           |  2   | 11.5662 |  18.8685  |    84.9383     |     nan     |    nan    |
|           hf_GPT2_large           |  4   | 21.0635 |  34.9334  |      nan       |     nan     |    nan    |
|             tacotron2             |  64  | 13.5662 |  26.3055  |      nan       |     nan     |    nan    |
|               hf_T5               |  8   | 3.7864  |  10.4607  |      nan       |     nan     |    nan    |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            Super_SloMo            |  6   | 1.0024 |   0.956   |      nan       |     nan     |  1.1855  |
|         timm_efficientnet         |  32  | 0.9998 |  0.7704   |      nan       |   0.7845    |  1.0652  |
|            timm_nfnet             | 128  | 0.9393 |   0.897   |      nan       |   0.9515    |  1.022   |
|         timm_efficientdet         |  1   | 1.0142 |  0.8251   |      nan       |     nan     |  1.0218  |
|      resnet50_quantized_qat       |  32  | 0.9967 |  0.9967   |     0.9967     |   0.9967    |  1.0001  |
|    mobilenet_v2_quantized_qat     |  96  | 0.9957 |  0.9957   |     0.9957     |   0.9957    |  0.9992  |
|           mobilenet_v2            |  96  | 0.9993 |  0.7661   |      nan       |   0.7676    |  0.9975  |
|              demucs               |  4   | 0.9886 |  0.9886   |     0.9886     |   0.9886    |  0.9886  |
|            tts_angular            |  64  | 0.9884 |  0.9884   |     0.984      |   0.9884    |  0.9842  |
|              hf_GPT2              |  4   | 0.9548 |   0.887   |      nan       |     nan     |  0.9505  |
|        Background_Matting         |  4   | 1.0026 |   0.952   |      nan       |   0.9773    |  0.9139  |
|          pytorch_stargan          |  16  | 0.9975 |   1.019   |     0.2027     |   1.0085    |  0.9023  |
|        speech_transformer         |  32  | 0.9988 |  0.9152   |      nan       |     nan     |  0.8959  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9986 |  0.9173   |     0.2326     |   0.9114    |  0.8941  |
|             hf_Albert             |  8   | 0.9333 |  0.9333   |      nan       |     nan     |  0.8804  |
|           pytorch_unet            |  1   | 0.9985 |  0.8536   |      nan       |    0.851    |  0.859   |
|              hf_Bart              |  4   | 0.9617 |   0.878   |      nan       |     nan     |  0.853   |
|              hf_Bert              |  4   | 0.9683 |  0.8952   |      nan       |     nan     |  0.8517  |
|            timm_regnet            |  32  | 1.0013 |  0.8634   |      nan       |   0.8806    |  0.8481  |
|        shufflenet_v2_x1_0         | 128  |  1.0   |  0.9163   |      nan       |   0.8868    |  0.8447  |
|           fastNLP_Bert            |  6   | 1.0012 |  0.9152   |      nan       |     nan     |  0.8343  |
| attention_is_all_you_need_pytorch | 256  | 0.9481 |  0.9241   |      nan       |     nan     |  0.8261  |
|            timm_vovnet            |  32  | 0.9933 |  0.7644   |      nan       |   0.7778    |  0.8252  |
|           BERT_pytorch            |  16  |  1.0   |  0.8995   |      nan       |     nan     |  0.825   |
|            hf_T5_large            |  2   | 0.922  |  0.8722   |      nan       |     nan     |  0.8237  |
|            hf_BigBird             |  2   | 0.9609 |  0.9609   |      nan       |     nan     |  0.8205  |
|           squeezenet1_1           |  32  | 0.9749 |  0.8159   |     0.2781     |   0.9742    |  0.8159  |
|           hf_DistilBert           |  8   | 0.9212 |  0.9053   |      nan       |     nan     |  0.7841  |
|               dcgan               |  32  |  1.0   |  0.7784   |     0.3321     |   0.7784    |  0.767   |
|               moco                |  32  | 1.0067 |  0.9701   |      nan       |     nan     |  0.7668  |
|              alexnet              | 128  | 0.9998 |  0.7731   |     0.3805     |   0.7736    |  0.743   |
|            mnasnet1_0             |  32  | 0.9988 |  0.9087   |     0.1627     |   0.8348    |  0.7268  |
|             resnet50              |  32  | 1.0002 |  0.8763   |      nan       |   0.8011    |  0.7254  |
|   timm_vision_transformer_large   |  8   | 1.0022 |  0.8433   |      nan       |   0.8015    |  0.7222  |
|      timm_vision_transformer      |  8   |  1.0   |  0.8883   |      nan       |   0.8108    |  0.712   |
|        mobilenet_v3_large         |  32  | 0.9958 |  0.8655   |      nan       |   0.8773    |  0.7041  |
|               dlrm                | 2048 | 0.7282 |  0.7283   |      nan       |     nan     |  0.6973  |
|           timm_resnest            |  32  | 0.9935 |  0.8869   |      nan       |   0.8075    |  0.6862  |
|            densenet121            |  4   |  1.0   |  0.8812   |      nan       |   0.8571    |  0.6618  |
|          resnext50_32x4d          |  8   | 0.9994 |  0.8687   |      nan       |   0.8223    |  0.6615  |
|               vgg16               |  64  |  1.0   |  0.6663   |     0.2532     |   0.6664    |  0.6471  |
|          LearningToPaint          |  96  | 0.9442 |  0.6918   |      nan       |   0.6272    |  0.6444  |
|         soft_actor_critic         | 256  | 0.964  |   0.964   |     0.4356     |   0.9555    |  0.6428  |
|                drq                |  1   | 0.8541 |  0.8541   |      nan       |   0.8541    |  0.6427  |
|             resnet18              |  16  | 0.9846 |  0.7907   |      nan       |   0.7038    |  0.6163  |
|           lennard_jones           | 1000 |  1.0   |    1.0    |     0.3712     |   1.0947    |  0.5646  |
|      nvidia_deeprecommender       | 256  | 0.5598 |  0.5598   |     0.4734     |   0.5598    |  0.5598  |
|          pytorch_struct           | 200  |  1.0   |  0.5079   |     0.4824     |   0.5079    |  0.4222  |
|       functorch_dp_cifar10        |  64  | 0.9626 |  0.8251   |      nan       |   0.8254    |  0.4037  |
|            hf_Reformer            |  4   | 0.3011 |    nan    |     0.1803     |     nan     |  0.299   |
|              yolov3               |  16  | 1.0072 |  0.8533   |      nan       |   0.8915    |   nan    |
|           hf_Longformer           |  2   | 0.9603 |  0.9603   |     0.2879     |     nan     |   nan    |
|             tacotron2             |  64  | 0.9922 |  1.1046   |      nan       |     nan     |   nan    |
|               hf_T5               |  8   | 0.9527 |  0.9446   |      nan       |     nan     |   nan    |
|           hf_GPT2_large           |  4   | 0.936  |  0.8771   |      nan       |     nan     |   nan    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|       MT5ForConditionalGeneration       | 2  | 1.0268 |   0.919   |      0.0       |     0.0     |  4.2611  |
|           ElectraForCausalLM            | 1  | 1.0463 |  0.9209   |      0.0       |     0.0     |  4.1573  |
|            YituTechConvBert             | 1  | 1.0326 |  0.9386   |      0.0       |     0.0     |  3.1049  |
|         MegatronBertForCausalLM         | 2  | 1.043  |   0.943   |      0.0       |     0.0     |  2.8277  |
|           RobertaForCausalLM            | 4  | 1.0398 |  0.9419   |      0.0       |     0.0     |  2.7707  |
|          MobileBertForMaskedLM          | 16 | 1.0228 |   0.919   |      0.0       |     0.0     |  2.6211  |
|     M2M100ForConditionalGeneration      | 2  | 1.1203 |  1.0476   |      0.0       |     0.0     |  2.584   |
|             OPTForCausalLM              | 4  | 1.0181 |  0.9027   |      0.0       |     0.0     |  2.5794  |
|             XGLMForCausalLM             | 1  | 1.0148 |  0.8733   |      0.0       |     0.0     |  2.4465  |
|     PegasusForConditionalGeneration     | 4  | 1.0132 |   0.883   |      0.0       |     0.0     |  2.4111  |
|     MobileBertForQuestionAnswering      | 32 | 1.0191 |  0.9141   |      0.0       |     0.0     |  2.3065  |
|                CamemBert                | 1  | 1.046  |   0.945   |      0.0       |     0.0     |  2.2963  |
|               DistillGPT2               | 1  | 1.0351 |  0.9295   |      0.0       |     0.0     |  2.0155  |
|     PLBartForConditionalGeneration      | 8  | 1.0177 |  0.8977   |      0.0       |     0.0     |  1.8483  |
|               GoogleFnet                | 1  | 1.0022 |  0.8086   |      0.0       |   1.1178    |  1.7839  |
|      GPT2ForSequenceClassification      | 4  | 0.9991 |   0.977   |      0.0       |     0.0     |  1.6644  |
|    MegatronBertForQuestionAnswering     | 8  | 1.0461 |  0.9419   |      0.0       |     0.0     |  1.6081  |
|      MBartForConditionalGeneration      | 8  | 1.0126 |   0.916   |      0.0       |     0.0     |  1.4634  |
|            XLNetLMHeadModel             | 4  | 0.9991 |  0.9656   |      0.0       |     0.0     |  1.4289  |
|           PegasusForCausalLM            | 8  | 1.0088 |  0.9262   |      0.0       |     0.0     |  1.3581  |
|       T5ForConditionalGeneration        | 4  | 1.002  |  0.9661   |      0.0       |     0.0     |  1.349   |
|            TrOCRForCausalLM             | 8  | 1.0149 |  0.9561   |      0.0       |     0.0     |  1.337   |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.9999   |      0.0       |     0.0     |  1.3032  |
|            AlbertForMaskedLM            | 2  | 1.0008 |  0.9987   |      0.0       |     0.0     |  1.2986  |
|         Speech2Text2ForCausalLM         | 64 | 1.0087 |  0.9398   |      0.0       |     0.0     |  1.2936  |
|    LayoutLMForSequenceClassification    | 16 | 0.9991 |  0.9865   |      0.0       |     0.0     |  1.2471  |
|                 T5Small                 | 1  | 1.0201 |  0.9507   |      0.0       |     0.0     |  1.2467  |
|      BartForConditionalGeneration       | 1  | 1.0117 |  0.8919   |      0.0       |     0.0     |  1.2102  |
|     DistilBertForQuestionAnswering      | 32 | 1.0287 |  0.9788   |      0.0       |     0.0     |  1.186   |
|       DebertaForQuestionAnswering       | 4  | 0.9307 |  0.7473   |     0.7971     |     0.0     |  1.1787  |
|          DistilBertForMaskedLM          | 16 | 1.0282 |  0.9804   |      0.0       |     0.0     |  1.1665  |
|            PLBartForCausalLM            | 16 | 1.0148 |  0.9447   |      0.0       |     0.0     |  1.1599  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0103 |  0.9364   |      0.0       |     0.0     |  1.1574  |
|             BartForCausalLM             | 2  | 0.9992 |  0.9654   |      0.0       |     0.0     |  1.1029  |
|       RobertaForQuestionAnswering       | 64 | 0.9986 |  0.9812   |      0.0       |     0.0     |  1.1015  |
|        BertForQuestionAnswering         | 64 | 0.9987 |  0.9812   |      0.0       |     0.0     |  1.0921  |
|                 BigBird                 | 1  | 0.996  |  0.9401   |      0.0       |     0.0     |  1.0903  |
|            MBartForCausalLM             | 16 | 1.0061 |  0.9666   |      0.0       |     0.0     |  1.0422  |
|             BertForMaskedLM             | 64 | 0.9993 |  0.9612   |      0.0       |     0.0     |  1.0404  |
|           DebertaForMaskedLM            | 4  | 0.9338 |  0.8099   |     0.7224     |     0.0     |  1.0183  |
|       BlenderbotSmallForCausalLM        | 64 | 1.001  |  0.9056   |      0.0       |     0.0     |  1.0071  |
|          AllenaiLongformerBase          | 1  | 0.9525 |  0.8694   |     0.7836     |     0.0     |   0.0    |
|       ElectraForQuestionAnswering       | 64 | 0.9988 |  0.9837   |      0.0       |     0.0     |   0.0    |
|           LayoutLMForMaskedLM           | 16 | 0.9989 |  0.9699   |      0.0       |     0.0     |   0.0    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser |  inductor   |
+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+
|               GoogleFnet                | 1  |  pass  |   pass    |  fail_to_run   |    pass     |    pass     |
|             BartForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             BertForMaskedLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|        BertForQuestionAnswering         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                 BigBird                 | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                CamemBert                | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           DebertaForMaskedLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       DebertaForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|          DistilBertForMaskedLM          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     DistilBertForQuestionAnswering      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|               DistillGPT2               | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           ElectraForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       ElectraForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|      GPT2ForSequenceClassification      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           LayoutLMForMaskedLM           | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|    LayoutLMForSequenceClassification    | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            MBartForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       MT5ForConditionalGeneration       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|         MegatronBertForCausalLM         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|          MobileBertForMaskedLM          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     MobileBertForQuestionAnswering      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             OPTForCausalLM              | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            PLBartForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           PegasusForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     PegasusForConditionalGeneration     | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           RobertaForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       RobertaForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|         Speech2Text2ForCausalLM         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       T5ForConditionalGeneration        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                 T5Small                 | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            TrOCRForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            XLNetLMHeadModel             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            YituTechConvBert             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            AlbertForMaskedLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|       AlbertForQuestionAnswering        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|          AllenaiLongformerBase          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|      MBartForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|      BartForConditionalGeneration       | 0  | 0.0000 |  0.0000   |     0.0000     |   0.0000    |   0.0000    |
|     M2M100ForConditionalGeneration      | 0  | 0.0000 |  0.0000   |     0.0000     |   0.0000    |   0.0000    |
|             XGLMForCausalLM             | 0  | 0.0000 |  0.0000   |     0.0000     |   0.0000    |   0.0000    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|                  name                   | bs |  eager   | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|            XLNetLMHeadModel             | 4  | 17.6114  |  35.8594  |      nan       |     nan     | 311.0562 |
|          MobileBertForMaskedLM          | 16 | 134.9363 | 159.1989  |      nan       |     nan     | 285.2019 |
|     MobileBertForQuestionAnswering      | 32 | 131.2332 | 154.9954  |      nan       |     nan     | 264.1833 |
|     M2M100ForConditionalGeneration      | 2  | 25.6915  |  36.5483  |      nan       |     nan     | 249.2139 |
|       MT5ForConditionalGeneration       | 2  |  6.4578  |  16.8657  |      nan       |     nan     | 202.2192 |
|       T5ForConditionalGeneration        | 4  |  3.7498  |  10.6849  |      nan       |     nan     | 200.4614 |
|      MBartForConditionalGeneration      | 8  | 26.4326  |  39.5871  |      nan       |     nan     | 185.0177 |
|     PegasusForConditionalGeneration     | 4  |  25.99   |  38.2544  |      nan       |     nan     | 176.0357 |
|      BartForConditionalGeneration       | 1  | 26.0076  |  38.7586  |      nan       |     nan     |  171.48  |
|            YituTechConvBert             | 1  |  8.9211  |  16.6277  |      nan       |     nan     | 171.2474 |
|             XGLMForCausalLM             | 1  | 14.9478  |  24.9402  |      nan       |     nan     | 165.6688 |
|           DebertaForMaskedLM            | 4  |  6.9897  |  13.3816  |    50.5707     |     nan     | 159.4509 |
|         MegatronBertForCausalLM         | 2  | 16.6098  |  26.1689  |      nan       |     nan     | 156.6464 |
|                 T5Small                 | 1  |  3.849   |  10.5757  |      nan       |     nan     | 156.5866 |
|    MegatronBertForQuestionAnswering     | 8  | 16.2237  |  26.0801  |      nan       |     nan     | 152.5495 |
|     PLBartForConditionalGeneration      | 8  |  7.1207  |  13.3694  |      nan       |     nan     | 148.7384 |
| BlenderbotSmallForConditionalGeneration | 32 | 11.9825  |  20.3854  |      nan       |     nan     | 134.417  |
|       DebertaForQuestionAnswering       | 4  |  6.9424  |  13.2729  |    50.3621     |     nan     | 120.3692 |
|           RobertaForCausalLM            | 4  |  5.0343  |  9.8896   |      nan       |     nan     | 108.708  |
|    LayoutLMForSequenceClassification    | 16 |  5.2725  |  10.0526  |      nan       |     nan     | 102.2804 |
|           PegasusForCausalLM            | 8  |  9.8065  |  14.5866  |      nan       |     nan     | 98.5429  |
|            MBartForCausalLM             | 16 |  9.8899  |  14.5906  |      nan       |     nan     | 91.5006  |
|             OPTForCausalLM              | 4  |  4.6313  |  9.4156   |      nan       |     nan     | 88.7484  |
|             BertForMaskedLM             | 64 |  5.0508  |  9.6996   |      nan       |     nan     | 87.3848  |
|             BartForCausalLM             | 2  |  9.8513  |   14.44   |      nan       |     nan     | 87.1208  |
|      GPT2ForSequenceClassification      | 4  |  3.4283  |  7.9924   |      nan       |     nan     |  86.486  |
|            TrOCRForCausalLM             | 8  |  9.7915  |  14.4556  |      nan       |     nan     | 78.9711  |
|               DistillGPT2               | 1  |  1.4429  |  3.7509   |      nan       |     nan     | 75.3892  |
|           ElectraForCausalLM            | 1  |  5.088   |  9.8527   |      nan       |     nan     | 72.4505  |
|            PLBartForCausalLM            | 16 |  3.2335  |  5.4969   |      nan       |     nan     | 70.0976  |
|                CamemBert                | 1  |  4.996   |  9.9911   |      nan       |     nan     |  68.592  |
|     DistilBertForQuestionAnswering      | 32 |  1.7309  |  4.1188   |      nan       |     nan     | 68.2232  |
|         Speech2Text2ForCausalLM         | 64 |  3.1456  |   5.399   |      nan       |     nan     | 67.8609  |
|       BlenderbotSmallForCausalLM        | 64 |  4.796   |   7.892   |      nan       |     nan     | 67.5137  |
|       RobertaForQuestionAnswering       | 64 |  4.8999  |  9.8389   |      nan       |     nan     | 66.2968  |
|        BertForQuestionAnswering         | 64 |  4.8621  |  9.7433   |      nan       |     nan     | 65.7811  |
|            AlbertForMaskedLM            | 2  |  1.2227  |  6.2484   |      nan       |     nan     | 65.4464  |
|                 BigBird                 | 1  | 11.1119  |  16.9625  |      nan       |     nan     | 58.5731  |
|          DistilBertForMaskedLM          | 16 |  1.7176  |  4.1522   |      nan       |     nan     |  51.726  |
|       AlbertForQuestionAnswering        | 2  |  1.2187  |  6.0422   |      nan       |     nan     | 45.4511  |
|               GoogleFnet                | 1  |  1.9996  |  4.2959   |      nan       |   10.5864   | 44.8824  |
|          AllenaiLongformerBase          | 1  |  11.654  |  19.7453  |     85.668     |     nan     |   nan    |
|           LayoutLMForMaskedLM           | 16 |  5.3735  |  10.207   |      nan       |     nan     |   nan    |
|       ElectraForQuestionAnswering       | 64 |  4.9237  |   9.746   |      nan       |     nan     |   nan    |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|      GPT2ForSequenceClassification      | 4  | 0.9342 |  0.9091   |      nan       |     nan     |  1.0318  |
|            XLNetLMHeadModel             | 4  | 1.0001 |  0.8976   |      nan       |     nan     |  0.9717  |
|    LayoutLMForSequenceClassification    | 16 |  1.0   |  0.9348   |      nan       |     nan     |  0.9339  |
|        BertForQuestionAnswering         | 64 |  1.0   |  0.9467   |      nan       |     nan     |  0.9145  |
|       RobertaForQuestionAnswering       | 64 |  1.0   |  0.9467   |      nan       |     nan     |  0.9145  |
|                 T5Small                 | 1  |  1.0   |  0.9325   |      nan       |     nan     |  0.8445  |
|     DistilBertForQuestionAnswering      | 32 |  1.0   |  0.9046   |      nan       |     nan     |  0.8394  |
|             BertForMaskedLM             | 64 |  1.0   |  0.9219   |      nan       |     nan     |  0.8321  |
|             BartForCausalLM             | 2  |  1.0   |  0.8847   |      nan       |     nan     |  0.8303  |
|                 BigBird                 | 1  | 1.0001 |  0.9549   |      nan       |     nan     |  0.8224  |
|          DistilBertForMaskedLM          | 16 | 0.9998 |  0.9138   |      nan       |     nan     |  0.8055  |
|            PLBartForCausalLM            | 16 | 0.9997 |  0.8802   |      nan       |     nan     |  0.8028  |
|            MBartForCausalLM             | 16 |  1.0   |  0.8629   |      nan       |     nan     |  0.8005  |
|               DistillGPT2               | 1  | 1.0003 |  0.7721   |      nan       |     nan     |  0.7997  |
|         Speech2Text2ForCausalLM         | 64 |  1.0   |   0.88    |      nan       |     nan     |  0.7768  |
|       T5ForConditionalGeneration        | 4  |  1.0   |  0.9597   |      nan       |     nan     |  0.7754  |
|             XGLMForCausalLM             | 1  | 0.9999 |  0.9999   |      nan       |     nan     |  0.7728  |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8465   |      nan       |     nan     |  0.7708  |
| BlenderbotSmallForConditionalGeneration | 32 |  1.0   |  0.9036   |      nan       |     nan     |  0.7612  |
|     PLBartForConditionalGeneration      | 8  | 0.9997 |  0.8222   |      nan       |     nan     |  0.7547  |
|                CamemBert                | 1  | 0.998  |  0.7977   |      nan       |     nan     |  0.7369  |
|            YituTechConvBert             | 1  | 0.9858 |  0.7923   |      nan       |     nan     |  0.7299  |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.8048   |      nan       |     nan     |  0.7284  |
|       BlenderbotSmallForCausalLM        | 64 |  1.0   |  0.8401   |      nan       |     nan     |  0.7277  |
|      MBartForConditionalGeneration      | 8  |  1.0   |  0.8137   |      nan       |     nan     |  0.727   |
|             OPTForCausalLM              | 4  | 0.9979 |   0.75    |      nan       |     nan     |  0.714   |
|           RobertaForCausalLM            | 4  | 0.9058 |  0.7778   |      nan       |     nan     |  0.7099  |
|           PegasusForCausalLM            | 8  |  1.0   |  0.9323   |      nan       |     nan     |  0.7012  |
|    MegatronBertForQuestionAnswering     | 8  | 0.923  |  0.8265   |      nan       |     nan     |  0.6997  |
|               GoogleFnet                | 1  | 1.0003 |  0.9447   |      nan       |   1.0813    |  0.6953  |
|     M2M100ForConditionalGeneration      | 2  | 0.9783 |  0.9777   |      nan       |     nan     |  0.6688  |
|         MegatronBertForCausalLM         | 2  | 0.7066 |  0.7066   |      nan       |     nan     |  0.6453  |
|     PegasusForConditionalGeneration     | 4  | 0.9721 |  0.9004   |      nan       |     nan     |  0.642   |
|       MT5ForConditionalGeneration       | 2  | 0.6173 |  0.6173   |      nan       |     nan     |  0.6173  |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.9369   |      nan       |     nan     |  0.6126  |
|           ElectraForCausalLM            | 1  |  1.0   |  0.9107   |      nan       |     nan     |  0.6123  |
|            AlbertForMaskedLM            | 2  | 0.9999 |  0.9172   |      nan       |     nan     |  0.6027  |
|          MobileBertForMaskedLM          | 16 | 0.9997 |  0.9179   |      nan       |     nan     |  0.5861  |
|     MobileBertForQuestionAnswering      | 32 |  1.0   |  0.9716   |      nan       |     nan     |  0.4668  |
|           DebertaForMaskedLM            | 4  |  1.0   |  0.9851   |     0.352      |     nan     |  0.4265  |
|       DebertaForQuestionAnswering       | 4  | 0.9845 |  1.0525   |     0.3276     |     nan     |  0.3569  |
|          AllenaiLongformerBase          | 1  | 0.9988 |  0.9515   |     0.3143     |     nan     |   nan    |
|       ElectraForQuestionAnswering       | 64 |  1.0   |  0.9524   |      nan       |     nan     |   nan    |
|           LayoutLMForMaskedLM           | 16 |  1.0   |  0.9409   |      nan       |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|            hrnet_w18            |  2  | 1.0063 |  1.0839   |      0.0       |   1.4454    |  4.4633  |
|        res2net50_14w_8s         |  2  | 1.0006 |  1.0247   |      0.0       |   1.4422    |  4.1571  |
|           res2next50            |  2  | 1.0002 |  1.0372   |      0.0       |   1.3702    |  4.1399  |
|         coat_lite_mini          | 128 | 0.9999 |  0.9989   |      0.0       |   1.0734    |  1.7041  |
|          ghostnet_100           | 128 | 0.9986 |  0.9941   |      0.0       |    1.243    |  1.6133  |
|        tnt_s_patch16_224        | 64  | 0.9995 |  0.9981   |      0.0       |   1.5571    |  1.5001  |
|        twins_pcpvt_base         | 32  | 1.0054 |  0.9735   |      0.0       |   1.2889    |  1.4364  |
|      xcit_large_24_p8_224       |  5  | 1.0011 |  0.9919   |      0.0       |     0.0     |  1.4302  |
|         crossvit_9_240          | 64  | 1.0068 |  0.9963   |      0.0       |    1.062    |  1.4092  |
|           volo_d1_224           | 64  | 0.9996 |  0.9949   |      0.0       |   1.1385    |  1.4042  |
|            nfnet_l0             | 64  | 1.0001 |   0.797   |      0.0       |   1.0495    |  1.381   |
|          gmixer_24_224          | 64  | 0.9991 |  0.8429   |      0.0       |   0.9957    |  1.3504  |
|          jx_nest_base           | 32  | 0.9992 |  0.9943   |      0.0       |   1.2244    |  1.2882  |
|           convit_base           | 32  | 0.9991 |   0.995   |      0.0       |   1.1931    |  1.2615  |
|            lcnet_050            | 128 | 0.9547 |  0.9495   |      0.0       |   1.5025    |  1.2406  |
|          cait_m36_384           |  2  | 0.9979 |  0.9981   |      0.0       |   0.9962    |  1.2022  |
|          convnext_base          | 32  | 0.9992 |  0.9967   |      0.0       |   1.0434    |  1.172   |
|          gmlp_s16_224           | 64  | 0.9991 |   0.996   |      0.0       |   0.9991    |  1.142   |
|      beit_base_patch16_224      | 64  | 0.9998 |  0.9813   |      0.0       |   0.9537    |  1.1226  |
|           regnety_002           | 128 | 0.9787 |  0.9996   |      0.0       |   1.3613    |  1.1081  |
| deit_base_distilled_patch16_224 | 64  | 0.9997 |   0.998   |      0.0       |    1.019    |  1.1058  |
|      vit_base_patch16_224       | 64  | 0.9997 |  0.9982   |      0.0       |   0.9781    |  1.0981  |
|          mixer_b16_224          | 64  | 0.9996 |  0.9968   |      0.0       |   0.9838    |  1.0523  |
|            mixnet_l             | 64  | 0.971  |  0.8727   |      0.0       |   1.0065    |  1.0458  |
|           tf_mixnet_l           | 64  | 0.9718 |  0.8763   |      0.0       |   1.0061    |  1.0239  |
|             dpn107              | 32  | 0.9587 |  0.9505   |      0.0       |   1.0289    |  1.0034  |
|             dla102              | 64  | 0.9995 |  0.9965   |      0.0       |   1.2853    |  0.9963  |
|          resmlp_12_224          | 128 | 0.9997 |  0.9986   |      0.0       |     0.0     |  0.9746  |
|           resnest101e           | 32  | 1.0033 |  1.0192   |      0.0       |   1.1978    |  0.9554  |
|       tf_efficientnet_b0        | 128 | 0.977  |  0.7833   |      0.0       |   0.9847    |  0.8973  |
|            repvgg_a2            | 128 | 0.9645 |  0.9628   |      0.0       |   1.1198    |  0.891   |
|           selecsls42b           | 128 | 0.9994 |  0.9981   |      0.0       |   1.2083    |  0.8872  |
|          spnasnet_100           | 128 | 0.9614 |  0.9577   |      0.0       |   1.1368    |  0.886   |
|         visformer_small         | 128 |  1.0   |  1.0012   |      0.0       |   1.0216    |  0.8772  |
|            fbnetv3_b            | 128 | 0.965  |  0.9616   |      0.0       |   1.1289    |  0.8724  |
|            gernet_l             | 128 | 0.9735 |  0.9722   |      0.0       |   1.0981    |  0.8702  |
|           mnasnet_100           | 128 | 0.9667 |  0.9638   |      0.0       |   1.1557    |  0.8485  |
|      mobilenetv3_large_100      | 128 | 0.965  |  0.9626   |      0.0       |   1.1636    |  0.8457  |
|          cspdarknet53           | 64  | 0.9583 |  0.9521   |      0.0       |   1.1839    |  0.8444  |
|            tinynet_a            | 128 | 0.9667 |   0.776   |      0.0       |   0.9711    |  0.8364  |
|           mobilevit_s           | 32  | 0.9725 |  0.7645   |      0.0       |   0.9873    |  0.8216  |
|       eca_botnext26ts_256       | 64  | 0.973  |  0.7708   |      0.0       |   1.0167    |  0.7978  |
|        sebotnet33ts_256         | 64  | 0.9759 |  0.8072   |      0.0       |   1.0536    |  0.7733  |
|        eca_halonext26ts         | 64  | 0.9743 |  0.7761   |      0.0       |   1.0143    |  0.7709  |
|           fbnetc_100            | 128 | 0.9668 |  0.9628   |      0.0       |   1.1885    |  0.7567  |
|        res2net101_26w_4s        | 64  | 0.9988 |  0.9967   |      0.0       |   1.1758    |  0.7474  |
|           rexnet_100            | 128 | 0.9724 |  0.8167   |      0.0       |   0.9834    |  0.676   |
|         mobilenetv2_100         | 128 | 0.9668 |  0.9633   |      0.0       |   1.0116    |  0.669   |
|        ese_vovnet19b_dw         | 128 | 0.9789 |  0.9774   |      0.0       |   1.1447    |  0.6203  |
|          botnet26t_256          | 128 | 0.9859 |  0.9852   |      0.0       |   1.2245    |   0.0    |
|           dm_nfnet_f0           | 128 | 0.9993 |  0.9997   |      0.0       |   1.2107    |   0.0    |
|        adv_inception_v3         | 128 | 0.9998 |  0.9971   |      0.0       |   1.1256    |   0.0    |
|       gluon_inception_v3        | 128 |  1.0   |  0.9985   |      0.0       |   1.1248    |   0.0    |
|          inception_v3           | 128 | 0.9998 |  0.9968   |      0.0       |   1.1246    |   0.0    |
|     swsl_resnext101_32x16d      | 32  | 0.9995 |  0.9987   |      0.0       |    1.108    |   0.0    |
|          pnasnet5large          | 16  | 0.9988 |  0.9979   |      0.0       |    1.083    |   0.0    |
|        convmixer_768_32         | 32  | 1.0003 |  0.9997   |      0.0       |    1.061    |   0.0    |
|            pit_b_224            | 64  | 0.9998 |  0.9973   |      0.0       |   1.0594    |   0.0    |
|        gluon_xception65         | 32  | 0.9995 |  0.9967   |      0.0       |   1.0398    |   0.0    |
|         poolformer_m36          | 64  | 0.9994 |  0.9967   |      0.0       |   1.0063    |   0.0    |
|  swin_base_patch4_window7_224   | 64  | 0.9996 |  0.9715   |      0.0       |    1.003    |   0.0    |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|          convnext_base          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          gmixer_24_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          gmlp_s16_224           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|        adv_inception_v3         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          botnet26t_256          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        convmixer_768_32         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          cspdarknet53           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dla102              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dpn107              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        eca_halonext26ts         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            gernet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          ghostnet_100           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       gluon_inception_v3        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            hrnet_w18            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          inception_v3           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            lcnet_050            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            mixnet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         mobilenetv2_100         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           mobilevit_s           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            nfnet_l0             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          pnasnet5large          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           regnety_002           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net101_26w_4s        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net50_14w_8s         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           res2next50            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           rexnet_100            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           selecsls42b           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           tf_mixnet_l           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            tinynet_a            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         visformer_small         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      vit_base_patch16_224       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |
|      xcit_large_24_p8_224       | 2  | pass  | fail_accuracy |  fail_to_run   |  fail_to_run  |     pass      |
|        gluon_xception65         | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|         poolformer_m36          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|         coat_lite_mini          | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            pit_b_224            | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      | fail_accuracy |
|            fbnetv3_b            | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|           resnest101e           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy | fail_accuracy |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy | fail_accuracy |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|              name               | bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|            hrnet_w18            |  2  | 99.0925 | 129.1429  |      nan       |  302.8547   | 1305.5451 |
|             dpn107              | 32  | 13.3369 |  24.7075  |      nan       |   86.8976   | 1280.7288 |
|           rexnet_100            | 128 | 6.4069  |  11.8586  |      nan       |  107.4496   | 988.7111  |
|        res2net50_14w_8s         |  2  | 19.6264 |  33.3971  |      nan       |   88.2353   | 936.7276  |
|           mobilevit_s           | 32  | 5.7236  |  11.1465  |      nan       |   45.981    | 799.1906  |
|            mixnet_l             | 64  | 13.2916 |  20.3439  |      nan       |   69.4935   | 778.2989  |
|       eca_botnext26ts_256       | 64  | 2.5909  |  6.1519   |      nan       |   49.7554   | 723.0911  |
|          ghostnet_100           | 128 | 9.0362  |  15.9147  |      nan       |   66.3032   | 685.6784  |
|            tinynet_a            | 128 |  7.425  |  13.1769  |      nan       |   67.229    | 660.1354  |
|            fbnetv3_b            | 128 | 12.7368 |  20.2478  |      nan       |   86.4999   | 650.6102  |
|           fbnetc_100            | 128 | 5.4761  |  10.6475  |      nan       |   49.3807   | 636.7924  |
|        twins_pcpvt_base         | 32  | 25.3344 |  36.9167  |      nan       |   68.5287   | 623.4564  |
|           resnest101e           | 32  | 26.2018 |  40.9862  |      nan       |  100.5019   | 610.1099  |
|         coat_lite_mini          | 128 | 3.0107  |  7.0432   |      nan       |   16.4276   | 607.5914  |
|        res2net101_26w_4s        | 64  | 25.6881 |  41.7492  |      nan       |  105.5198   |  529.815  |
|           res2next50            |  2  | 7.2984  |  14.4734  |      nan       |   48.4801   | 510.8523  |
|             dla102              | 64  | 10.5521 |  19.1407  |      nan       |   71.9313   | 507.7483  |
|        sebotnet33ts_256         | 64  | 3.8312  |  8.4416   |      nan       |   53.5113   | 491.4471  |
|           tf_mixnet_l           | 64  |  13.42  |  20.5125  |      nan       |   70.1229   | 489.7827  |
|          cspdarknet53           | 64  | 6.0697  |  11.541   |      nan       |   51.9091   | 486.8412  |
|           mnasnet_100           | 128 | 4.1071  |  7.8077   |      nan       |   40.3807   | 437.7587  |
|       tf_efficientnet_b0        | 128 | 5.6858  |  10.6249  |      nan       |   65.6932   | 426.6445  |
|        eca_halonext26ts         | 64  | 2.5793  |  6.4106   |      nan       |   51.7275   | 422.1906  |
|           regnety_002           | 128 | 4.7761  |  9.4998   |      nan       |   50.0005   | 380.0622  |
|        ese_vovnet19b_dw         | 128 | 1.9265  |  4.0512   |      nan       |   31.7936   | 376.7527  |
|          convnext_base          | 32  | 11.4469 |  15.8597  |      nan       |   30.6712   | 366.3008  |
|         mobilenetv2_100         | 128 | 3.9971  |   7.732   |      nan       |   40.4497   | 363.7838  |
|          spnasnet_100           | 128 | 5.3407  |  10.1369  |      nan       |   47.4009   | 351.9616  |
|      xcit_large_24_p8_224       |  5  | 37.1866 |  52.5417  |      nan       |     nan     | 332.3336  |
|          jx_nest_base           | 32  | 9.9406  |  17.229   |      nan       |   66.5254   | 311.8953  |
|      mobilenetv3_large_100      | 128 | 4.3523  |  8.1262   |      nan       |   67.3167   | 311.1143  |
|         visformer_small         | 128 | 2.3158  |   5.403   |      nan       |   25.7883   | 310.6265  |
|          cait_m36_384           |  2  | 47.2186 |  64.0945  |      nan       |   90.7984   | 298.0052  |
|         crossvit_9_240          | 64  | 7.4081  |  13.6019  |      nan       |   32.2106   | 266.0203  |
|           selecsls42b           | 128 | 2.3164  |  5.4867   |      nan       |   42.0583   | 257.8308  |
|            gernet_l             | 128 | 4.8222  |  9.2556   |      nan       |   39.347    | 251.3237  |
|            lcnet_050            | 128 | 1.9314  |  4.1819   |      nan       |   32.1152   |  232.143  |
|           volo_d1_224           | 64  | 6.5276  |  12.7236  |      nan       |   32.8592   |  182.781  |
|           convit_base           | 32  | 3.8981  |  8.8328   |      nan       |   21.0897   | 177.8059  |
|          gmlp_s16_224           | 64  | 9.0879  |  14.1942  |      nan       |   21.4561   | 145.7858  |
|        tnt_s_patch16_224        | 64  | 12.1016 |  20.3575  |      nan       |   34.7907   | 143.4652  |
|          gmixer_24_224          | 64  | 8.4244  |  14.0469  |      nan       |   23.4351   | 135.0839  |
|            repvgg_a2            | 128 | 4.7534  |   9.049   |      nan       |   47.3708   |  128.289  |
|            nfnet_l0             | 64  | 5.8266  |  11.3992  |      nan       |   31.553    | 103.5121  |
|          resmlp_12_224          | 128 | 2.6977  |  5.0748   |      nan       |     nan     | 101.4346  |
|          mixer_b16_224          | 64  | 2.8858  |  5.2583   |      nan       |   13.4757   |  97.4849  |
| deit_base_distilled_patch16_224 | 64  | 3.0426  |  6.3566   |      nan       |   13.0728   |  78.6364  |
|      beit_base_patch16_224      | 64  | 4.4735  |  8.5964   |      nan       |   18.2085   |  75.1216  |
|      vit_base_patch16_224       | 64  | 2.8552  |  6.5407   |      nan       |   11.5077   |  59.9998  |
|          pnasnet5large          | 16  | 60.8211 |  79.9493  |      nan       |  183.1858   |    nan    |
|          inception_v3           | 128 | 8.3458  |  15.9807  |      nan       |   75.3239   |    nan    |
|        adv_inception_v3         | 128 | 8.5007  |  15.7832  |      nan       |   75.0367   |    nan    |
|       gluon_inception_v3        | 128 | 8.1377  |  16.0286  |      nan       |   74.6521   |    nan    |
|  swin_base_patch4_window7_224   | 64  | 11.8907 |  22.2574  |      nan       |   68.2608   |    nan    |
|        gluon_xception65         | 32  | 14.9179 |  24.5631  |      nan       |   55.7975   |    nan    |
|     swsl_resnext101_32x16d      | 32  | 10.1223 |  18.5382  |      nan       |   49.2201   |    nan    |
|          botnet26t_256          | 128 |  2.287  |   5.453   |      nan       |   42.0242   |    nan    |
|           dm_nfnet_f0           | 128 | 6.4591  |  11.8769  |      nan       |   34.8338   |    nan    |
|         poolformer_m36          | 64  | 13.0015 |  19.6132  |      nan       |   34.8218   |    nan    |
|        convmixer_768_32         | 32  | 6.7607  |  11.8459  |      nan       |   19.5188   |    nan    |
|            pit_b_224            | 64  | 3.6984  |  7.7193   |      nan       |   15.3124   |    nan    |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|          gmixer_24_224          | 64  | 0.9992 |  0.9684   |      nan       |   0.9825    |  1.3808  |
|            nfnet_l0             | 64  | 1.0008 |  0.8298   |      nan       |    0.813    |  1.2558  |
|            tinynet_a            | 128 |  1.0   |  0.7831   |      nan       |   0.7845    |  1.1735  |
|        eca_halonext26ts         | 64  |  1.0   |  0.7717   |      nan       |   0.7731    |  1.1316  |
|           rexnet_100            | 128 | 0.9992 |  0.7879   |      nan       |    0.871    |  1.1072  |
|           convit_base           | 32  | 1.0001 |  0.8879   |      nan       |   0.9506    |  1.068   |
|         mobilenetv2_100         | 128 | 0.9998 |  0.7664   |      nan       |   0.7679    |  1.0051  |
|           mobilevit_s           | 32  | 0.9999 |  0.7692   |      nan       |   0.7431    |  1.0012  |
|             dla102              | 64  | 0.9881 |  0.9181   |      nan       |   0.9541    |  1.001   |
|       eca_botnext26ts_256       | 64  |  1.0   |  0.7705   |      nan       |   0.7679    |  0.9703  |
|           tf_mixnet_l           | 64  | 1.0001 |   0.861   |      nan       |   0.8605    |  0.9698  |
|          cait_m36_384           |  2  | 1.0001 |  0.9024   |      nan       |   0.9202    |  0.9451  |
|       tf_efficientnet_b0        | 128 | 0.9998 |  0.7727   |      nan       |   0.8426    |  0.9413  |
|          mixer_b16_224          | 64  | 0.9956 |  0.9615   |      nan       |   0.8644    |  0.9357  |
|      beit_base_patch16_224      | 64  |  1.0   |  0.9575   |      nan       |   0.8606    |  0.9272  |
|          gmlp_s16_224           | 64  |  1.0   |  0.9766   |      nan       |    0.966    |  0.9267  |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9469   |      nan       |   0.8229    |  0.915   |
|        tnt_s_patch16_224        | 64  | 1.0001 |  0.9752   |      nan       |   0.8518    |  0.9131  |
|           volo_d1_224           | 64  | 0.9999 |  0.9247   |      nan       |   0.7472    |  0.9124  |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9476   |      nan       |   0.8242    |  0.9095  |
|          spnasnet_100           | 128 | 1.0005 |  0.9207   |      nan       |   0.8496    |  0.9024  |
|           selecsls42b           | 128 | 0.9883 |  0.8982   |      nan       |   0.9039    |   0.9    |
|            mixnet_l             | 64  | 0.9995 |  0.8486   |      nan       |   0.7938    |  0.8993  |
|      mobilenetv3_large_100      | 128 | 1.0002 |  0.8686   |      nan       |   0.8819    |  0.8982  |
|      xcit_large_24_p8_224       |  5  | 0.9999 |  0.9206   |      nan       |     nan     |  0.8952  |
|           resnest101e           | 32  |  1.0   |  0.9458   |      nan       |   0.9449    |  0.8922  |
|          ghostnet_100           | 128 | 0.9998 |  0.8872   |      nan       |    0.947    |  0.8888  |
|         visformer_small         | 128 | 0.9943 |  0.9442   |      nan       |   0.9475    |  0.8883  |
|            fbnetv3_b            | 128 | 0.9995 |  0.7866   |      nan       |   0.7861    |  0.8837  |
|             dpn107              | 32  | 0.9997 |  0.9285   |      nan       |   0.8949    |  0.8762  |
|          convnext_base          | 32  | 1.0001 |  0.9077   |      nan       |   0.7678    |  0.8761  |
|        twins_pcpvt_base         | 32  | 1.0002 |  0.9127   |      nan       |   0.8351    |  0.8723  |
|          cspdarknet53           | 64  |  1.0   |  0.8562   |      nan       |   0.8797    |  0.8624  |
|          jx_nest_base           | 32  | 1.0017 |   0.898   |      nan       |   0.7112    |  0.8574  |
|        ese_vovnet19b_dw         | 128 | 0.9999 |  0.8938   |      nan       |   0.9369    |  0.8467  |
|        sebotnet33ts_256         | 64  |  1.0   |  0.7109   |      nan       |   0.6852    |  0.841   |
|          resmlp_12_224          | 128 | 0.9893 |  0.9525   |      nan       |     nan     |  0.8169  |
|        res2net101_26w_4s        | 64  | 1.0001 |  0.9307   |      nan       |   0.8959    |  0.8168  |
|         crossvit_9_240          | 64  | 1.0001 |  0.8721   |      nan       |    0.729    |  0.8108  |
|           mnasnet_100           | 128 | 1.0003 |  0.9126   |      nan       |   0.8368    |  0.7984  |
|         coat_lite_mini          | 128 | 1.0049 |  0.8826   |      nan       |   0.7873    |   0.79   |
|            lcnet_050            | 128 | 1.0005 |  0.7721   |      nan       |   0.7722    |  0.7579  |
|           regnety_002           | 128 | 0.9981 |   0.829   |      nan       |   0.7759    |  0.7465  |
|            gernet_l             | 128 |  1.0   |  0.7965   |      nan       |   0.8012    |  0.727   |
|           fbnetc_100            | 128 | 0.9998 |  0.8597   |      nan       |   0.7507    |  0.7246  |
|            hrnet_w18            |  2  | 0.9986 |  0.8792   |      nan       |   0.8869    |  0.6089  |
|           res2next50            |  2  |  1.0   |  0.8353   |      nan       |   0.8404    |  0.5946  |
|        res2net50_14w_8s         |  2  |  1.0   |  0.8387   |      nan       |   0.8474    |  0.5879  |
|            repvgg_a2            | 128 | 1.0003 |  0.8145   |      nan       |   0.6633    |  0.536   |
|          pnasnet5large          | 16  | 1.069  |   1.011   |      nan       |   1.2062    |   nan    |
|        convmixer_768_32         | 32  |  1.0   |  0.9868   |      nan       |   0.9807    |   nan    |
|           dm_nfnet_f0           | 128 | 0.9393 |   0.897   |      nan       |   0.9515    |   nan    |
|         poolformer_m36          | 64  | 1.0003 |  0.9533   |      nan       |   0.9368    |   nan    |
|        gluon_xception65         | 32  | 0.9999 |  0.9384   |      nan       |   0.9001    |   nan    |
|        adv_inception_v3         | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |   nan    |
|       gluon_inception_v3        | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |   nan    |
|          inception_v3           | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |   nan    |
|     swsl_resnext101_32x16d      | 32  | 1.0003 |  0.8983   |      nan       |   0.8684    |   nan    |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9309   |      nan       |    0.83     |   nan    |
|          botnet26t_256          | 128 |  1.0   |  0.8494   |      nan       |   0.7497    |   nan    |
|            pit_b_224            | 64  | 0.9992 |  0.7962   |      nan       |   0.6417    |   nan    |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Performance graphs

see more

bench_logs/timm_models_float32.png :

bench_logs/huggingface_float32.png :

bench_logs/torchbench_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      | 98%, 52/53 | 98%, 42/43  | 100%, 61/61 |
|   aot_eager    | 98%, 52/53 | 98%, 42/43  | 90%, 55/61  |
| aot_cudagraphs | 28%, 15/53 |  2%, 1/43   |  8%, 5/61   |
|  aot_nvfuser   | 60%, 32/53 |  0%, 0/43   | 75%, 46/61  |
|    inductor    | 83%, 44/53 | 86%, 37/43  | 90%, 55/61  |
+----------------+------------+-------------+-------------+

Geometric mean speedup

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   1.00x    |    1.01x    |    1.00x    |
|   aot_eager    |   1.00x    |    1.00x    |    1.00x    |
| aot_cudagraphs |   1.09x    |    1.00x    |    1.00x    |
|  aot_nvfuser   |   1.16x    |    0.0x     |    1.20x    |
|    inductor    |   1.70x    |    2.17x    |    1.30x    |
+----------------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |    6.19    |    14.88    |    11.64    |
|   aot_eager    |   12.45    |    25.75    |    19.94    |
| aot_cudagraphs |   13.09    |    92.75    |    51.56    |
|  aot_nvfuser   |   29.54    |     0.0     |    80.08    |
|    inductor    |   271.08   |   116.86    |   450.74    |
+----------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   0.96x    |    0.98x    |    1.00x    |
|   aot_eager    |   0.85x    |    0.86x    |    0.88x    |
| aot_cudagraphs |   0.43x    |    0.38x    |    0.20x    |
|  aot_nvfuser   |   0.83x    |    0.0x     |    0.85x    |
|    inductor    |   0.78x    |    0.82x    |    0.89x    |
+----------------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|       functorch_dp_cifar10        |  64  | 1.0027 |  0.9251   |      0.0       |   1.1901    |  4.8999  |
|            densenet121            |  4   | 1.0013 |  0.9144   |      0.0       |   1.3911    |  4.7967  |
|         timm_efficientdet         |  1   | 0.9864 |   0.789   |      0.0       |     0.0     |  4.1288  |
|           BERT_pytorch            |  16  | 1.0115 |  0.8389   |      0.0       |     0.0     |  3.1411  |
|      timm_vision_transformer      |  8   | 1.0012 |  0.8564   |      0.0       |   1.3359    |  3.0906  |
|                drq                |  1   | 1.0045 |  0.8048   |      0.0       |   1.0807    |  2.8848  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9951 |  0.9111   |     1.3023     |   1.2173    |  2.8184  |
|             resnet18              |  16  | 1.0009 |  0.9878   |      0.0       |   1.3374    |  2.7412  |
|               dcgan               |  32  | 0.9775 |  0.9129   |     1.101      |   0.7342    |  2.5263  |
|           squeezenet1_1           |  32  | 0.9953 |  0.9584   |     1.4044     |   1.1906    |  2.5217  |
|             hf_Albert             |  8   | 1.0009 |  0.9534   |      0.0       |     0.0     |  2.397   |
|              hf_Bert              |  4   | 1.0348 |  0.8618   |      0.0       |     0.0     |  2.2545  |
|               hf_T5               |  8   | 0.9992 |  0.9393   |      0.0       |     0.0     |  2.1467  |
|          resnext50_32x4d          |  8   | 1.0002 |   0.951   |      0.0       |   1.3302    |  2.1414  |
|            hf_T5_large            |  2   | 1.0183 |  0.8592   |      0.0       |     0.0     |  2.0785  |
|           lennard_jones           | 1000 | 0.9794 |  0.7615   |     1.2817     |   1.0468    |  2.0147  |
|        mobilenet_v3_large         |  32  | 1.0023 |  1.0098   |      0.0       |   1.4107    |  2.0136  |
|          pytorch_struct           | 200  | 0.9866 |   0.746   |     1.1521     |    1.011    |  2.0082  |
|              hf_GPT2              |  4   | 1.014  |  0.9867   |      0.0       |     0.0     |  1.8579  |
|          LearningToPaint          |  96  | 1.0025 |  1.0068   |      0.0       |    1.355    |  1.8566  |
|            mnasnet1_0             |  32  | 0.9949 |  1.0116   |     0.8977     |   1.4086    |  1.8302  |
|              hf_Bart              |  4   | 1.0161 |  0.8395   |      0.0       |     0.0     |  1.7504  |
|           fastNLP_Bert            |  6   | 0.9978 |  0.8872   |      0.0       |     0.0     |  1.6528  |
|        speech_transformer         |  32  | 1.0054 |  0.8358   |      0.0       |     0.0     |  1.6385  |
| attention_is_all_you_need_pytorch | 256  | 1.0061 |  0.8945   |      0.0       |     0.0     |  1.5148  |
|         timm_efficientnet         |  32  | 0.9619 |  0.8176   |      0.0       |   1.1837    |  1.4918  |
|           hf_DistilBert           |  8   | 1.0156 |   0.969   |      0.0       |     0.0     |  1.478   |
|         soft_actor_critic         | 256  | 1.0223 |  0.7463   |     1.261      |   1.0634    |  1.4398  |
|           pytorch_unet            |  1   | 0.9996 |   0.993   |      0.0       |   1.1553    |  1.3534  |
|          pytorch_stargan          |  16  | 0.9983 |  1.0034   |     0.8258     |   1.0964    |  1.343   |
|            timm_nfnet             | 128  | 0.9994 |  0.9988   |      0.0       |   1.1733    |  1.3237  |
|        shufflenet_v2_x1_0         | 128  | 0.9995 |  1.0166   |      0.0       |   1.3486    |  1.3069  |
|            Super_SloMo            |  6   | 0.9998 |  0.9956   |      0.0       |     0.0     |  1.2884  |
|               vgg16               |  64  | 0.9999 |  0.9974   |     0.7982     |   0.9961    |  1.2713  |
|        Background_Matting         |  4   | 0.9996 |  1.0182   |      0.0       |   1.1153    |  1.2157  |
|              alexnet              | 128  | 0.9993 |  0.9964   |     0.788      |   1.0031    |  1.2097  |
|   timm_vision_transformer_large   |  8   | 0.9991 |  0.9893   |      0.0       |   0.9929    |  1.1578  |
|            hf_Reformer            |  4   | 0.9958 |  0.9992   |     0.9196     |     0.0     |  1.1578  |
|           timm_resnest            |  32  | 1.0025 |  1.0206   |      0.0       |   1.3168    |  1.1577  |
|            hf_BigBird             |  2   | 0.9911 |  0.9187   |      0.0       |     0.0     |  1.1435  |
|            timm_vovnet            |  32  | 0.9224 |  0.8875   |      0.0       |   1.1275    |  1.1074  |
|            tts_angular            |  64  | 1.0135 |  0.9582   |     1.0002     |   0.9789    |  1.0026  |
|              demucs               |  4   | 1.0019 |  0.9992   |     0.9995     |   0.9981    |  0.9998  |
|      nvidia_deeprecommender       | 256  | 0.9989 |  0.9958   |     0.6966     |   0.9783    |  0.9901  |
|             resnet50              |  32  | 1.0016 |  1.0097   |      0.0       |   1.3632    |  0.9717  |
|               moco                |  32  | 0.9956 |    0.0    |      0.0       |     0.0     |  0.9496  |
|           mobilenet_v2            |  96  | 0.9989 |  0.9866   |      0.0       |   0.9244    |  0.8705  |
|            timm_regnet            |  32  | 0.9775 |  0.9387   |      0.0       |   1.1858    |  0.8539  |
|              yolov3               |  16  | 0.9991 |   0.988   |      0.0       |   0.9136    |   0.0    |
|           hf_Longformer           |  2   | 0.9636 |   0.877   |     0.8882     |     0.0     |   0.0    |
|               dlrm                | 2048 |  0.0   |   1.173   |      0.0       |     0.0     |   0.0    |
|           hf_GPT2_large           |  4   | 0.9995 |  0.9901   |      0.0       |     0.0     |   0.0    |
|             tacotron2             |  64  |  0.98  |   0.762   |      0.0       |     0.0     |   0.0    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|        Background_Matting         |  4  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          LearningToPaint          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            densenet121            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|                drq                |  1  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           mobilenet_v2            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           pytorch_unet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet18              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet50              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          resnext50_32x4d          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|         timm_efficientnet         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_regnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           timm_resnest            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|      timm_vision_transformer      |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_vovnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            Super_SloMo            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           fastNLP_Bert            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|             hf_Albert             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bert              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_BigBird             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_DistilBert           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_Longformer           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |   fail_to_run    |
|             tacotron2             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|               moco                |  2  |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |  fail_accuracy   |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |      0.0000      |
|              yolov3               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      0.0000      |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|         timm_efficientdet         |  1   | 53.0184 |  79.3008  |      nan       |     nan     | 1803.9091 |
|            hf_T5_large            |  2   | 36.9772 |  75.6557  |      nan       |     nan     | 1747.5354 |
|            densenet121            |  4   | 13.5869 |  29.3271  |      nan       |   139.711   | 1688.4482 |
|        mobilenet_v3_large         |  32  | 3.7848  |  9.2923   |      nan       |   75.4499   | 896.1598  |
|            mnasnet1_0             |  32  | 3.4537  |  8.6408   |    43.7596     |   46.4028   | 854.0232  |
|               moco                |  32  |  11.75  |    nan    |      nan       |     nan     | 725.5689  |
|           mobilenet_v2            |  96  | 3.3075  |  8.4906   |      nan       |   43.1604   | 646.6585  |
|          resnext50_32x4d          |  8   | 3.6179  |  9.2283   |      nan       |   39.0386   | 614.9828  |
|         timm_efficientnet         |  32  | 5.9682  |  12.6852  |      nan       |   73.4068   | 568.4268  |
|        shufflenet_v2_x1_0         | 128  | 3.7705  |  9.7518   |      nan       |   41.3993   | 446.0734  |
|            timm_nfnet             | 128  | 6.8229  |  13.575   |      nan       |   42.2533   | 420.5468  |
|           squeezenet1_1           |  32  |  0.676  |  1.7982   |     8.4293     |   6.8649    | 371.0668  |
|           timm_resnest            |  32  | 1.4485  |  4.3952   |      nan       |   43.3534   | 364.8723  |
|            timm_regnet            |  32  | 8.4026  |  17.2587  |      nan       |   66.4884   | 343.2572  |
| attention_is_all_you_need_pytorch | 256  | 4.3975  |  13.0252  |      nan       |     nan     | 277.6015  |
|            timm_vovnet            |  32  | 3.0983  |  7.3442   |      nan       |   32.2979   | 246.8202  |
|        speech_transformer         |  32  | 7.5464  |  17.1829  |      nan       |     nan     | 246.5735  |
|   timm_vision_transformer_large   |  8   | 23.1783 |  40.2448  |      nan       |   58.887    | 209.5997  |
|             resnet18              |  16  |  1.03   |  3.1392   |      nan       |   23.6353   | 207.0568  |
|       functorch_dp_cifar10        |  64  | 0.8539  |  2.5937   |      nan       |   6.4768    | 198.2407  |
|      timm_vision_transformer      |  8   |   3.2   |  8.1832   |      nan       |   16.3986   | 197.8043  |
|          LearningToPaint          |  96  | 1.0574  |  3.1713   |      nan       |   31.0316   | 196.8972  |
|           BERT_pytorch            |  16  | 5.0826  |  13.8418  |      nan       |     nan     |  177.679  |
|               hf_T5               |  8   | 3.9598  |  12.752   |      nan       |     nan     | 163.3629  |
|        Background_Matting         |  4   | 4.0825  |  9.3277   |      nan       |   45.7685   | 157.1204  |
|             resnet50              |  32  | 3.4998  |  9.0749   |      nan       |   43.7996   | 150.7682  |
|              hf_Bart              |  4   | 7.5111  |  17.2897  |      nan       |     nan     | 149.8316  |
|           fastNLP_Bert            |  6   | 5.3619  |  12.7524  |      nan       |     nan     | 148.7053  |
|              hf_GPT2              |  4   | 3.5663  |  10.0245  |      nan       |     nan     | 134.6332  |
|          pytorch_stargan          |  16  |  0.856  |  3.2566   |    11.5768     |   7.5483    | 130.0983  |
|          pytorch_struct           | 200  | 0.4439  |  1.2864   |     1.8954     |   5.4421    | 123.7179  |
|            Super_SloMo            |  6   | 2.3215  |  7.1108   |      nan       |     nan     | 114.2637  |
|             hf_Albert             |  8   | 1.5003  |  8.7612   |      nan       |     nan     |  90.632   |
|            hf_Reformer            |  4   |  3.125  |  5.8245   |    14.0523     |     nan     |  81.3988  |
|              hf_Bert              |  4   | 5.2086  |  12.6031  |      nan       |     nan     |  78.8477  |
|            hf_BigBird             |  2   | 12.0533 |  20.4705  |      nan       |     nan     |  70.7791  |
|           pytorch_unet            |  1   | 1.1442  |  3.4413   |      nan       |   26.7315   |  67.999   |
|           hf_DistilBert           |  8   | 1.8271  |  5.3273   |      nan       |     nan     |  55.4364  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.841  |   3.214   |    12.2729     |   5.1571    |  38.082   |
|               vgg16               |  64  | 0.3688  |  1.1116   |     4.3001     |   3.7105    |  37.4739  |
|              alexnet              | 128  | 0.2897  |  0.6979   |     2.0104     |   3.2319    |  27.3354  |
|                drq                |  1   | 0.2848  |   0.757   |      nan       |   4.4794    |  24.9175  |
|               dcgan               |  32  | 0.2625  |  0.6388   |     1.9613     |    4.319    |  19.5418  |
|      nvidia_deeprecommender       | 256  | 0.2826  |  0.6825   |     1.0449     |   3.0789    |  15.0583  |
|         soft_actor_critic         | 256  | 0.2699  |  0.4887   |     0.799      |   2.1142    |  14.7981  |
|           lennard_jones           | 1000 | 0.2467  |  0.5134   |     0.7056     |   1.5718    |  7.8075   |
|            tts_angular            |  64  | 0.3394  |   0.392   |     0.5831     |   1.1356    |  4.0722   |
|              demucs               |  4   | 0.9187  |  0.9051   |     0.889      |   0.9058    |  0.8242   |
|              yolov3               |  16  | 7.5657  |  15.5034  |      nan       |   46.3505   |    nan    |
|           hf_Longformer           |  2   | 11.7639 |  21.6948  |     92.072     |     nan     |    nan    |
|           hf_GPT2_large           |  4   | 21.5811 |  42.1326  |      nan       |     nan     |    nan    |
|             tacotron2             |  64  | 14.4366 |  30.0122  |      nan       |     nan     |    nan    |
|               dlrm                | 2048 |   nan   |  1.1963   |      nan       |     nan     |    nan    |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|             hf_Albert             |  8   | 0.9814 |   0.936   |      nan       |     nan     |  1.1576  |
|            Super_SloMo            |  6   | 1.0024 |  0.9697   |      nan       |     nan     |  1.1385  |
|            timm_nfnet             | 128  | 0.9761 |  0.9043   |      nan       |   0.9504    |  1.0243  |
|            tts_angular            |  64  | 1.0015 |  1.0015   |     0.9866     |   1.0015    |  0.9908  |
| attention_is_all_you_need_pytorch | 256  | 0.9976 |  0.9403   |      nan       |     nan     |  0.9875  |
|              demucs               |  4   | 0.987  |   0.987   |     0.987      |    0.987    |  0.987   |
|         timm_efficientdet         |  1   | 1.0316 |  0.8425   |      nan       |     nan     |  0.9858  |
|           BERT_pytorch            |  16  | 0.9998 |  0.8818   |      nan       |     nan     |  0.9728  |
|         timm_efficientnet         |  32  | 0.9982 |  0.7762   |      nan       |   0.7936    |  0.9689  |
|              hf_GPT2              |  4   | 0.971  |  0.8627   |      nan       |     nan     |  0.9645  |
|        Background_Matting         |  4   | 1.0201 |  0.9679   |      nan       |    0.987    |  0.9244  |
|        speech_transformer         |  32  | 1.0015 |  0.9177   |      nan       |     nan     |  0.9066  |
|           mobilenet_v2            |  96  | 1.0001 |  0.7725   |      nan       |   0.9235    |  0.8856  |
|           pytorch_unet            |  1   | 0.9968 |  0.8677   |      nan       |   0.8518    |  0.8681  |
|           fastNLP_Bert            |  6   | 1.0013 |  0.8966   |      nan       |     nan     |  0.8661  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.8624   |     0.2638     |   0.8441    |  0.8602  |
|            hf_T5_large            |  2   | 0.8541 |  0.8541   |      nan       |     nan     |  0.8535  |
|           hf_DistilBert           |  8   | 0.9505 |  0.8806   |      nan       |     nan     |  0.8387  |
|              hf_Bert              |  4   | 0.9844 |  0.8677   |      nan       |     nan     |  0.8383  |
|            timm_regnet            |  32  | 0.9999 |  0.8483   |      nan       |    0.85     |  0.8361  |
|              hf_Bart              |  4   | 0.9099 |  0.8321   |      nan       |     nan     |  0.8151  |
|            hf_BigBird             |  2   | 0.9852 |  0.9787   |      nan       |     nan     |   0.81   |
|            timm_vovnet            |  32  | 0.9903 |  0.7754   |      nan       |   0.7817    |  0.7861  |
|               moco                |  32  | 0.9667 |    nan    |      nan       |     nan     |  0.7819  |
|        shufflenet_v2_x1_0         | 128  | 1.0002 |   0.874   |      nan       |   0.8652    |  0.7813  |
|          pytorch_stargan          |  16  | 0.9929 |  0.9799   |     0.2149     |   0.8882    |  0.7783  |
|             resnet50              |  32  | 1.0004 |  0.8678   |      nan       |   0.8041    |  0.7745  |
|               dcgan               |  32  |  1.0   |  0.7949   |     0.343      |   0.7073    |  0.7527  |
|               vgg16               |  64  | 0.9998 |  0.7378   |     0.2978     |   0.7172    |  0.7491  |
|   timm_vision_transformer_large   |  8   | 0.9987 |  0.8365   |      nan       |   0.8491    |  0.7487  |
|              alexnet              | 128  | 1.0003 |  0.8082   |     0.4354     |    0.805    |  0.7352  |
|               hf_T5               |  8   | 0.9678 |  0.9371   |      nan       |     nan     |  0.7266  |
|           timm_resnest            |  32  | 0.9868 |  0.8809   |      nan       |   0.8726    |  0.722   |
|      timm_vision_transformer      |  8   | 1.0001 |  0.8868   |      nan       |   0.8871    |  0.7151  |
|            mnasnet1_0             |  32  | 0.9994 |  0.8793   |     0.173      |   0.8217    |  0.6596  |
|           squeezenet1_1           |  32  | 0.9604 |  0.7958   |     0.2952     |   0.7589    |  0.6595  |
|        mobilenet_v3_large         |  32  | 0.999  |  0.8661   |      nan       |    0.874    |  0.6573  |
|          resnext50_32x4d          |  8   |  1.0   |  0.8591   |      nan       |    0.823    |  0.6515  |
|                drq                |  1   | 0.9125 |  0.8399   |      nan       |   0.8395    |  0.6406  |
|         soft_actor_critic         | 256  | 0.964  |  0.9151   |     0.4737     |   0.9151    |  0.6279  |
|          LearningToPaint          |  96  | 0.9252 |  0.7196   |      nan       |    0.71     |  0.605   |
|            densenet121            |  4   |  1.0   |  0.8696   |      nan       |   0.8376    |  0.5739  |
|             resnet18              |  16  | 0.9782 |  0.7852   |      nan       |   0.7268    |  0.5644  |
|           lennard_jones           | 1000 |  1.0   |  1.0002   |     0.3735     |   1.0967    |  0.564   |
|      nvidia_deeprecommender       | 256  | 0.5596 |  0.5596   |     0.5262     |   0.5596    |  0.5596  |
|       functorch_dp_cifar10        |  64  | 0.9964 |  0.8131   |      nan       |    0.846    |  0.4465  |
|          pytorch_struct           | 200  |  1.0   |  0.5081   |     0.4858     |   0.5082    |  0.4235  |
|            hf_Reformer            |  4   | 0.3764 |  0.9993   |     0.2539     |     nan     |  0.3629  |
|              yolov3               |  16  | 1.0054 |  0.8488   |      nan       |   0.8244    |   nan    |
|           hf_Longformer           |  2   | 0.9734 |   0.967   |     0.3379     |     nan     |   nan    |
|           hf_GPT2_large           |  4   | 0.9586 |  0.8649   |      nan       |     nan     |   nan    |
|               dlrm                | 2048 |  nan   |  0.7282   |      nan       |     nan     |   nan    |
|             tacotron2             |  64  | 0.9879 |  0.4059   |      nan       |     nan     |   nan    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|       MT5ForConditionalGeneration       | 2  | 1.0217 |  0.8664   |      0.0       |     0.0     |  6.0266  |
|          MobileBertForMaskedLM          | 16 | 1.0165 |  0.8257   |      0.0       |     0.0     |  5.6755  |
|           ElectraForCausalLM            | 1  | 1.0352 |  0.8536   |      0.0       |     0.0     |  5.5645  |
|     MobileBertForQuestionAnswering      | 32 | 1.0175 |  0.8249   |      0.0       |     0.0     |  5.2401  |
|            YituTechConvBert             | 1  | 1.0261 |  0.8468   |      0.0       |     0.0     |  5.0492  |
|           RobertaForCausalLM            | 4  | 1.0398 |  0.8465   |      0.0       |     0.0     |  4.5969  |
|         MegatronBertForCausalLM         | 2  | 1.0374 |  0.8485   |      0.0       |     0.0     |  4.0218  |
|             OPTForCausalLM              | 4  | 1.0159 |  0.8276   |      0.0       |     0.0     |  3.9227  |
|     M2M100ForConditionalGeneration      | 2  | 1.0129 |  0.8218   |      0.0       |     0.0     |  3.6354  |
|                CamemBert                | 1  | 1.0388 |   0.859   |      0.0       |     0.0     |  3.5143  |
|     PegasusForConditionalGeneration     | 4  | 1.0118 |  0.8263   |      0.0       |     0.0     |  3.1923  |
|             XGLMForCausalLM             | 1  | 1.014  |  0.8144   |      0.0       |     0.0     |  3.1413  |
|     PLBartForConditionalGeneration      | 8  | 1.0194 |  0.8247   |      0.0       |     0.0     |  2.7305  |
|    MegatronBertForQuestionAnswering     | 8  | 1.0396 |  0.8582   |      0.0       |     0.0     |  2.7135  |
|               DistillGPT2               | 1  | 1.0314 |  0.8704   |      0.0       |     0.0     |  2.619   |
|      MBartForConditionalGeneration      | 8  | 1.0167 |  0.8336   |      0.0       |     0.0     |  2.3299  |
|      GPT2ForSequenceClassification      | 4  | 0.9989 |  0.9767   |      0.0       |     0.0     |  2.1375  |
|         Speech2Text2ForCausalLM         | 64 | 1.0086 |  0.8555   |      0.0       |     0.0     |  2.108   |
|       ElectraForQuestionAnswering       | 64 | 0.9994 |  0.9793   |      0.0       |     0.0     |  1.9642  |
|            TrOCRForCausalLM             | 8  | 1.0149 |  0.8298   |      0.0       |     0.0     |  1.8799  |
|          DistilBertForMaskedLM          | 16 | 1.0299 |  0.8516   |      0.0       |     0.0     |  1.8406  |
|           PegasusForCausalLM            | 8  | 1.0109 |   0.826   |      0.0       |     0.0     |  1.8182  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0087 |  0.8891   |      0.0       |     0.0     |  1.7899  |
|      BartForConditionalGeneration       | 1  | 1.0133 |   0.885   |      0.0       |     0.0     |  1.7522  |
|     DistilBertForQuestionAnswering      | 32 | 1.0305 |  0.8437   |      0.0       |     0.0     |  1.7502  |
|    LayoutLMForSequenceClassification    | 16 | 0.9983 |  0.9785   |      0.0       |     0.0     |  1.7243  |
|       T5ForConditionalGeneration        | 4  | 0.9926 |  0.9361   |      0.0       |     0.0     |  1.6977  |
|       AlbertForQuestionAnswering        | 2  | 1.0007 |  0.8082   |      0.0       |     0.0     |  1.6669  |
|            AlbertForMaskedLM            | 2  | 1.0004 |  0.8087   |      0.0       |     0.0     |  1.6562  |
|                 T5Small                 | 1  | 1.0258 |  0.8963   |      0.0       |     0.0     |  1.593   |
|            XLNetLMHeadModel             | 4  | 1.0008 |  0.9632   |      0.0       |     0.0     |  1.5916  |
|           LayoutLMForMaskedLM           | 16 | 0.9981 |  0.9701   |      0.0       |     0.0     |  1.5762  |
|            PLBartForCausalLM            | 16 | 1.0127 |  0.9448   |      0.0       |     0.0     |  1.5065  |
|             BartForCausalLM             | 2  | 1.0008 |  0.9636   |      0.0       |     0.0     |  1.4564  |
|       RobertaForQuestionAnswering       | 64 | 0.9977 |  0.9494   |      0.0       |     0.0     |  1.4507  |
|        BertForQuestionAnswering         | 64 | 0.9971 |  0.9668   |      0.0       |     0.0     |  1.4373  |
|            MBartForCausalLM             | 16 | 1.0105 |  0.9317   |      0.0       |     0.0     |  1.3877  |
|             BertForMaskedLM             | 64 | 0.9972 |  0.9548   |      0.0       |     0.0     |  1.3316  |
|       BlenderbotSmallForCausalLM        | 64 | 1.0012 |  0.9233   |      0.0       |     0.0     |  1.3041  |
|       DebertaForQuestionAnswering       | 4  | 0.9317 |  0.7286   |     0.9211     |     0.0     |  1.2886  |
|                 BigBird                 | 1  | 0.9945 |  0.9116   |      0.0       |     0.0     |  1.1342  |
|           DebertaForMaskedLM            | 4  | 0.9325 |  0.7359   |     0.7806     |     0.0     |  1.1239  |
|          AllenaiLongformerBase          | 1  | 0.9529 |  0.7382   |     0.8569     |     0.0     |   0.0    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser |  inductor   |
+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+
|            AlbertForMaskedLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       AlbertForQuestionAnswering        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             BartForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             BertForMaskedLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|        BertForQuestionAnswering         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                 BigBird                 | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                CamemBert                | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           DebertaForMaskedLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|          DistilBertForMaskedLM          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     DistilBertForQuestionAnswering      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|               DistillGPT2               | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           ElectraForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       ElectraForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|      GPT2ForSequenceClassification      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           LayoutLMForMaskedLM           | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|    LayoutLMForSequenceClassification    | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            MBartForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       MT5ForConditionalGeneration       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|         MegatronBertForCausalLM         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|          MobileBertForMaskedLM          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     MobileBertForQuestionAnswering      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             OPTForCausalLM              | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            PLBartForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           PegasusForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     PegasusForConditionalGeneration     | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           RobertaForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       RobertaForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|         Speech2Text2ForCausalLM         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       T5ForConditionalGeneration        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                 T5Small                 | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            TrOCRForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            XLNetLMHeadModel             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            YituTechConvBert             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       DebertaForQuestionAnswering       | 1  |  pass  |   pass    | fail_accuracy  | fail_to_run |    pass     |
|          AllenaiLongformerBase          | 1  |  pass  |   pass    |      pass      | fail_to_run | fail_to_run |
|      BartForConditionalGeneration       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|      MBartForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|     M2M100ForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |   0.0000    |
|             XGLMForCausalLM             | 0  | 0.0000 |  0.0000   |     0.0000     |   0.0000    |   0.0000    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|                  name                   | bs |  eager   | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|            XLNetLMHeadModel             | 4  | 18.4058  |  40.7914  |      nan       |     nan     | 324.9091 |
|          MobileBertForMaskedLM          | 16 | 135.1179 | 174.0517  |      nan       |     nan     | 310.7379 |
|     MobileBertForQuestionAnswering      | 32 | 132.1921 | 174.2548  |      nan       |     nan     | 288.521  |
|       T5ForConditionalGeneration        | 4  |  4.1395  |  12.6828  |      nan       |     nan     | 247.4364 |
|     M2M100ForConditionalGeneration      | 2  | 26.4684  |  45.8915  |      nan       |     nan     | 214.2353 |
|       MT5ForConditionalGeneration       | 2  |  6.5618  |  21.1144  |      nan       |     nan     | 202.0306 |
|            YituTechConvBert             | 1  |  9.4912  |  20.8653  |      nan       |     nan     | 187.9041 |
|             XGLMForCausalLM             | 1  | 15.5894  |  30.5654  |      nan       |     nan     | 170.4506 |
|      MBartForConditionalGeneration      | 8  | 26.8296  |  47.6996  |      nan       |     nan     | 170.0447 |
|     PegasusForConditionalGeneration     | 4  | 26.2079  |  45.6003  |      nan       |     nan     | 167.4475 |
|           DebertaForMaskedLM            | 4  |  7.3099  |  14.5402  |    53.1345     |     nan     | 163.0197 |
|    MegatronBertForQuestionAnswering     | 8  | 17.0641  |  31.1438  |      nan       |     nan     | 161.2878 |
|      BartForConditionalGeneration       | 1  | 26.4428  |  45.956   |      nan       |     nan     | 152.167  |
|         MegatronBertForCausalLM         | 2  | 16.4797  |  31.8313  |      nan       |     nan     | 144.6966 |
|                 T5Small                 | 1  |  3.9891  |  12.5222  |      nan       |     nan     | 144.5762 |
|     PLBartForConditionalGeneration      | 8  |  7.4848  |  17.1203  |      nan       |     nan     | 130.1847 |
| BlenderbotSmallForConditionalGeneration | 32 | 12.1662  |  25.0348  |      nan       |     nan     | 124.0423 |
|       DebertaForQuestionAnswering       | 4  |  7.3726  |  14.8011  |    53.6215     |     nan     | 120.8276 |
|           RobertaForCausalLM            | 4  |  5.3202  |  12.7009  |      nan       |     nan     | 107.3699 |
|    LayoutLMForSequenceClassification    | 16 |  5.6057  |  12.9613  |      nan       |     nan     | 92.3443  |
|           PegasusForCausalLM            | 8  |  9.9424  |  16.9838  |      nan       |     nan     | 90.8327  |
|             OPTForCausalLM              | 4  |  4.8978  |  12.0564  |      nan       |     nan     | 86.7957  |
|             BartForCausalLM             | 2  | 10.3112  |  17.1423  |      nan       |     nan     | 83.3249  |
|            MBartForCausalLM             | 16 | 10.0524  |  17.3672  |      nan       |     nan     | 83.1659  |
|       ElectraForQuestionAnswering       | 64 |  5.2311  |  12.7727  |      nan       |     nan     | 82.9515  |
|             BertForMaskedLM             | 64 |  5.2735  |  12.6394  |      nan       |     nan     | 82.4964  |
|           LayoutLMForMaskedLM           | 16 |  5.5665  |  13.2244  |      nan       |     nan     | 80.9633  |
|      GPT2ForSequenceClassification      | 4  |  3.6189  |  10.3118  |      nan       |     nan     | 76.5278  |
|           ElectraForCausalLM            | 1  |  5.3414  |  12.6918  |      nan       |     nan     | 71.6347  |
|            TrOCRForCausalLM             | 8  | 10.3861  |  17.3261  |      nan       |     nan     | 71.0615  |
|                 BigBird                 | 1  | 11.5557  |  20.1771  |      nan       |     nan     | 69.8588  |
|     DistilBertForQuestionAnswering      | 32 |  1.9056  |  5.4488   |      nan       |     nan     | 66.5726  |
|                CamemBert                | 1  |  5.3005  |  12.5095  |      nan       |     nan     | 65.4941  |
|            AlbertForMaskedLM            | 2  |  1.5705  |  8.8811   |      nan       |     nan     | 65.1659  |
|       BlenderbotSmallForCausalLM        | 64 |  4.9754  |  9.5922   |      nan       |     nan     | 63.5161  |
|            PLBartForCausalLM            | 16 |  3.2396  |  6.8243   |      nan       |     nan     | 62.7263  |
|       RobertaForQuestionAnswering       | 64 |  5.1814  |  12.7826  |      nan       |     nan     | 61.2356  |
|        BertForQuestionAnswering         | 64 |  5.1795  |  12.5753  |      nan       |     nan     | 60.1564  |
|         Speech2Text2ForCausalLM         | 64 |  3.3817  |  6.9074   |      nan       |     nan     | 59.4116  |
|               DistillGPT2               | 1  |  1.583   |  4.7509   |      nan       |     nan     | 58.5183  |
|          DistilBertForMaskedLM          | 16 |  2.0107  |  5.6044   |      nan       |     nan     | 50.1556  |
|       AlbertForQuestionAnswering        | 2  |  1.5747  |  8.7694   |      nan       |     nan     | 44.0509  |
|          AllenaiLongformerBase          | 1  | 12.4196  |  22.7413  |    92.7516     |     nan     |   nan    |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|      GPT2ForSequenceClassification      | 4  | 0.9675 |  0.9163   |      nan       |     nan     |  1.0699  |
|            XLNetLMHeadModel             | 4  | 0.9912 |  0.8791   |      nan       |     nan     |  1.0109  |
|       ElectraForQuestionAnswering       | 64 | 1.0016 |  0.9539   |      nan       |     nan     |  1.0002  |
|                 T5Small                 | 1  |  1.0   |  0.9124   |      nan       |     nan     |  0.9876  |
|           LayoutLMForMaskedLM           | 16 | 0.9999 |  0.9238   |      nan       |     nan     |  0.9871  |
|             BertForMaskedLM             | 64 | 0.9996 |   0.899   |      nan       |     nan     |  0.9811  |
|    LayoutLMForSequenceClassification    | 16 | 1.004  |  0.9325   |      nan       |     nan     |  0.9712  |
| BlenderbotSmallForConditionalGeneration | 32 | 0.9998 |  0.8996   |      nan       |     nan     |  0.9557  |
|             BartForCausalLM             | 2  |  1.0   |  0.8769   |      nan       |     nan     |  0.9545  |
|       T5ForConditionalGeneration        | 4  | 0.9996 |  0.9594   |      nan       |     nan     |  0.9525  |
|         Speech2Text2ForCausalLM         | 64 | 0.9954 |  0.8265   |      nan       |     nan     |  0.9452  |
|            PLBartForCausalLM            | 16 | 1.0006 |  0.8667   |      nan       |     nan     |  0.9395  |
|       BlenderbotSmallForCausalLM        | 64 | 0.9996 |  0.8172   |      nan       |     nan     |  0.9269  |
|        BertForQuestionAnswering         | 64 | 0.9995 |  0.9315   |      nan       |     nan     |  0.9256  |
|       RobertaForQuestionAnswering       | 64 | 0.9996 |  0.9315   |      nan       |     nan     |  0.9254  |
|          DistilBertForMaskedLM          | 16 | 0.9991 |  0.8698   |      nan       |     nan     |  0.9167  |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8619   |      nan       |     nan     |  0.881   |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.6451   |      nan       |     nan     |  0.8636  |
|            MBartForCausalLM             | 16 |  1.0   |  0.8398   |      nan       |     nan     |  0.8565  |
|            AlbertForMaskedLM            | 2  |  1.0   |  0.6364   |      nan       |     nan     |  0.8515  |
|                 BigBird                 | 1  | 1.0024 |  0.9513   |      nan       |     nan     |  0.8349  |
|     DistilBertForQuestionAnswering      | 32 | 0.9987 |  0.8967   |      nan       |     nan     |  0.834   |
|     PLBartForConditionalGeneration      | 8  | 0.9999 |  0.8304   |      nan       |     nan     |  0.8252  |
|               DistillGPT2               | 1  | 1.0006 |  0.7548   |      nan       |     nan     |  0.812   |
|      MBartForConditionalGeneration      | 8  | 0.9999 |  0.8187   |      nan       |     nan     |  0.7699  |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.7955   |      nan       |     nan     |  0.7566  |
|                CamemBert                | 1  | 0.9989 |  0.7872   |      nan       |     nan     |  0.7482  |
|             OPTForCausalLM              | 4  | 0.9975 |  0.7501   |      nan       |     nan     |  0.7473  |
|            YituTechConvBert             | 1  | 0.9718 |  0.7819   |      nan       |     nan     |  0.7407  |
|           PegasusForCausalLM            | 8  | 0.999  |  0.9444   |      nan       |     nan     |  0.7324  |
|           RobertaForCausalLM            | 4  | 0.9237 |  0.7741   |      nan       |     nan     |  0.7309  |
|             XGLMForCausalLM             | 1  | 0.9999 |  0.9992   |      nan       |     nan     |  0.7214  |
|    MegatronBertForQuestionAnswering     | 8  | 0.9051 |  0.8218   |      nan       |     nan     |  0.7107  |
|          MobileBertForMaskedLM          | 16 | 0.9985 |  0.8983   |      nan       |     nan     |  0.6948  |
|     PegasusForConditionalGeneration     | 4  | 0.9996 |  0.9196   |      nan       |     nan     |  0.6769  |
|           ElectraForCausalLM            | 1  | 0.9993 |  0.8955   |      nan       |     nan     |  0.6701  |
|         MegatronBertForCausalLM         | 2  | 0.7726 |  0.7726   |      nan       |     nan     |  0.6697  |
|     M2M100ForConditionalGeneration      | 2  | 0.9999 |  0.9497   |      nan       |     nan     |  0.6569  |
|     MobileBertForQuestionAnswering      | 32 | 1.0142 |  0.9796   |      nan       |     nan     |  0.6265  |
|       MT5ForConditionalGeneration       | 2  | 0.6019 |  0.6019   |      nan       |     nan     |  0.6019  |
|           DebertaForMaskedLM            | 4  | 0.9982 |  0.9826   |     0.3599     |     nan     |  0.4498  |
|       DebertaForQuestionAnswering       | 4  | 0.979  |  1.0568   |     0.3578     |     nan     |  0.3761  |
|          AllenaiLongformerBase          | 1  | 0.9996 |  0.9477   |     0.3752     |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|            hrnet_w18            |  2  | 1.0028 |  0.9644   |      0.0       |   1.3794    |  4.8666  |
|        res2net50_14w_8s         |  2  | 0.9994 |  0.9247   |      0.0       |   1.3968    |  4.7346  |
|           res2next50            |  2  | 1.0037 |  0.9304   |      0.0       |    1.362    |  4.6397  |
|        twins_pcpvt_base         | 32  | 1.0024 |  0.8988   |      0.0       |    1.36     |  2.5347  |
|      xcit_large_24_p8_224       |  5  | 1.0003 |    0.0    |      0.0       |     0.0     |  2.1071  |
|          cait_m36_384           |  2  | 1.0023 |  0.8557   |      0.0       |   1.3541    |  2.0791  |
|        tnt_s_patch16_224        | 64  | 0.9994 |  0.9944   |      0.0       |   1.8326    |  1.9956  |
|          ghostnet_100           | 128 | 1.0031 |  1.0008   |      0.0       |   1.5591    |  1.893   |
|         crossvit_9_240          | 64  | 1.0051 |  0.9639   |      0.0       |   1.1374    |  1.7206  |
|          gmixer_24_224          | 64  | 0.9987 |  0.8853   |      0.0       |   1.0128    |  1.6752  |
|           volo_d1_224           | 64  | 0.9994 |  0.9941   |      0.0       |   1.1497    |  1.6642  |
|            lcnet_050            | 128 | 0.9678 |  0.9515   |      0.0       |   1.6064    |  1.6229  |
|            nfnet_l0             | 64  | 1.006  |   0.839   |      0.0       |    1.193    |  1.5908  |
|           regnety_002           | 128 | 0.981  |   0.933   |      0.0       |   1.3813    |  1.5766  |
|  swin_base_patch4_window7_224   | 64  | 0.9992 |  0.9578   |      0.0       |   1.0465    |  1.5415  |
|         coat_lite_mini          | 128 |  1.0   |  0.9957   |      0.0       |   1.2651    |  1.4983  |
|          resmlp_12_224          | 128 | 1.0002 |  0.9982   |     0.7823     |     0.0     |  1.4718  |
|          jx_nest_base           | 32  | 0.9992 |  0.9917   |      0.0       |   1.2314    |   1.46   |
|           resnest101e           | 32  | 1.0043 |  0.9905   |      0.0       |   1.4192    |  1.4201  |
|          gmlp_s16_224           | 64  | 0.9989 |   0.983   |      0.0       |   1.0513    |  1.4139  |
|           convit_base           | 32  | 0.9994 |  0.9914   |      0.0       |     0.0     |  1.3895  |
|            pit_b_224            | 64  | 0.9995 |  0.9939   |      0.0       |   1.0686    |  1.3627  |
|           dm_nfnet_f0           | 128 | 0.9992 |  0.9992   |      0.0       |   1.1759    |  1.3014  |
|          mixer_b16_224          | 64  | 0.9992 |  0.9904   |     0.716      |   0.9657    |  1.2967  |
|      beit_base_patch16_224      | 64  | 0.9996 |  0.9776   |      0.0       |   1.0503    |  1.2906  |
| deit_base_distilled_patch16_224 | 64  | 0.9996 |  0.9913   |      0.0       |   1.0703    |  1.2895  |
|        adv_inception_v3         | 128 |  1.0   |  0.9952   |      0.0       |   1.1927    |  1.2253  |
|       gluon_inception_v3        | 128 |  1.0   |  0.9946   |      0.0       |    1.194    |  1.2168  |
|          inception_v3           | 128 |  1.0   |  0.9952   |      0.0       |   1.1935    |  1.2139  |
|         poolformer_m36          | 64  | 0.9991 |  0.9974   |      0.0       |     0.0     |  1.2087  |
|      vit_base_patch16_224       | 64  | 0.9997 |  0.9933   |      0.0       |   0.9995    |  1.1961  |
|           tf_mixnet_l           | 64  | 0.9832 |  0.8984   |      0.0       |   1.1168    |  1.1412  |
|           mobilevit_s           | 32  | 0.9752 |  0.7969   |      0.0       |   1.2175    |  1.1277  |
|            mixnet_l             | 64  | 0.9802 |   0.889   |      0.0       |   1.1177    |  1.0927  |
|         visformer_small         | 128 | 1.0003 |  1.0006   |      0.0       |   1.0867    |  1.0534  |
|          pnasnet5large          | 16  | 1.0052 |  1.0238   |      0.0       |   1.1323    |  1.0315  |
|             dla102              | 64  | 0.9994 |  1.0099   |      0.0       |   1.3742    |  1.0293  |
|            fbnetv3_b            | 128 | 0.9685 |  0.9577   |      0.0       |   1.2758    |  0.9577  |
|           mnasnet_100           | 128 | 0.9535 |  0.9394   |     0.6673     |   1.3679    |  0.9231  |
|            repvgg_a2            | 128 | 0.9416 |  0.9342   |      0.0       |   1.1287    |  0.9156  |
|           selecsls42b           | 128 | 0.9995 |  0.9942   |      0.0       |    1.356    |  0.8981  |
|            tinynet_a            | 128 | 0.9605 |  0.8048   |      0.0       |   1.0887    |  0.8876  |
|        convmixer_768_32         | 32  | 0.9997 |  0.9979   |      0.0       |   1.0523    |  0.8863  |
|             dpn107              | 32  | 0.9485 |  0.9127   |      0.0       |   0.9813    |  0.8856  |
|          cspdarknet53           | 64  | 0.9432 |   0.935   |      0.0       |   0.9008    |  0.8791  |
|          convnext_base          | 32  | 1.0058 |  0.9438   |      0.0       |   1.3613    |  0.8489  |
|        res2net101_26w_4s        | 64  | 1.0025 |   0.996   |      0.0       |   1.3914    |  0.8471  |
|      mobilenetv3_large_100      | 128 | 0.9552 |  0.9437   |      0.0       |   1.3446    |  0.8334  |
|          spnasnet_100           | 128 | 0.9462 |  0.9369   |     0.6574     |   1.3183    |  0.8288  |
|            gernet_l             | 128 | 0.9466 |  0.9359   |      0.0       |   1.1389    |  0.7974  |
|           fbnetc_100            | 128 | 0.9525 |  0.9432   |     0.6733     |   1.3758    |  0.7479  |
|        eca_halonext26ts         | 64  | 0.9639 |  0.8063   |      0.0       |   1.1003    |  0.7363  |
|        sebotnet33ts_256         | 64  | 0.9669 |  0.8367   |      0.0       |    1.116    |  0.7274  |
|       tf_efficientnet_b0        | 128 | 0.9642 |  0.8073   |      0.0       |   1.0953    |  0.7162  |
|       eca_botnext26ts_256       | 64  | 0.9627 |  0.8009   |      0.0       |   1.1043    |  0.703   |
|          botnet26t_256          | 128 | 0.9783 |  0.9756   |      0.0       |   1.3439    |  0.6823  |
|         mobilenetv2_100         | 128 | 0.9498 |  0.9402   |      0.0       |   0.8656    |  0.6635  |
|        ese_vovnet19b_dw         | 128 | 0.9693 |   0.965   |      0.0       |   1.2431    |  0.6551  |
|           rexnet_100            | 128 | 0.9775 |  0.8495   |      0.0       |   1.0358    |  0.6527  |
|     swsl_resnext101_32x16d      | 32  | 0.9995 |  0.9796   |      0.0       |   1.0735    |  0.6428  |
|        gluon_xception65         | 32  | 0.998  |  0.9783   |      0.0       |   1.0628    |  0.5736  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|        adv_inception_v3         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          botnet26t_256          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        convmixer_768_32         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          cspdarknet53           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dla102              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dpn107              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        eca_halonext26ts         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            gernet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          ghostnet_100           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       gluon_inception_v3        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          inception_v3           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            lcnet_050            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            mixnet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         mobilenetv2_100         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           mobilevit_s           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            nfnet_l0             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          pnasnet5large          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           regnety_002           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net101_26w_4s        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net50_14w_8s         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           res2next50            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           rexnet_100            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           selecsls42b           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           tf_mixnet_l           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            tinynet_a            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         visformer_small         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      vit_base_patch16_224       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |
|      xcit_large_24_p8_224       | 2  | pass  |  fail_to_run  |  fail_to_run   |  fail_to_run  |     pass      |
|          gmixer_24_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|          gmlp_s16_224           | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|         poolformer_m36          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|           resnest101e           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|         coat_lite_mini          | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            pit_b_224            | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|        gluon_xception65         | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|            hrnet_w18            | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |
|            fbnetv3_b            | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy | fail_accuracy |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy | fail_accuracy |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|              name               | bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|            hrnet_w18            |  2  | 97.6966 | 141.0268  |      nan       |   477.271   | 1428.6487 |
|          pnasnet5large          | 16  | 59.9965 |  89.713   |      nan       |  251.7262   | 1281.2421 |
|             dpn107              | 32  | 13.8456 |  28.5265  |      nan       |  112.7519   | 1259.4166 |
|           rexnet_100            | 128 | 6.6675  |  14.2855  |      nan       |  122.0738   | 1152.8284 |
|        res2net50_14w_8s         |  2  | 20.0355 |  38.8849  |      nan       |  123.4919   | 956.6994  |
|           mobilevit_s           | 32  | 6.1429  |  13.5761  |      nan       |   62.1746   | 912.5899  |
|            mixnet_l             | 64  | 13.5148 |  22.8111  |      nan       |   89.7195   | 839.4074  |
|       eca_botnext26ts_256       | 64  | 2.6024  |  7.2121   |      nan       |   64.9197   | 837.0591  |
|        twins_pcpvt_base         | 32  | 26.7089 |  45.6423  |      nan       |   99.6458   |  834.726  |
|          ghostnet_100           | 128 | 9.3711  |  18.8885  |      nan       |   98.4747   | 771.2625  |
|            tinynet_a            | 128 | 7.7367  |  15.6026  |      nan       |   84.8686   | 727.8874  |
|            fbnetv3_b            | 128 | 13.3606 |  24.3739  |      nan       |  111.7964   |  698.534  |
|         coat_lite_mini          | 128 | 3.3191  |  9.0908   |      nan       |   34.6481   | 679.6933  |
|           resnest101e           | 32  | 26.9489 |  47.6734  |      nan       |  126.5945   | 658.9299  |
|             dla102              | 64  | 10.6743 |  22.7225  |      nan       |   97.4741   | 630.0407  |
|           fbnetc_100            | 128 |  5.671  |  12.3828  |    89.3023     |   64.1864   | 627.2809  |
|        sebotnet33ts_256         | 64  | 3.9369  |   9.979   |      nan       |   70.2461   | 608.7544  |
|          botnet26t_256          | 128 | 2.5158  |  6.7876   |      nan       |   51.1621   | 591.8039  |
|           tf_mixnet_l           | 64  | 14.0096 |  23.5027  |      nan       |   89.8185   | 550.0596  |
|          cspdarknet53           | 64  | 6.2996  |  13.5792  |      nan       |   45.3241   | 535.2317  |
|        eca_halonext26ts         | 64  | 2.7292  |  7.5233   |      nan       |   67.9176   |  531.045  |
|           res2next50            |  2  | 7.6466  |  17.1314  |      nan       |   65.4278   | 518.6104  |
|       tf_efficientnet_b0        | 128 | 6.0641  |  13.1288  |      nan       |   83.8699   | 508.1979  |
|        adv_inception_v3         | 128 |  8.67   |  18.8374  |      nan       |  106.7887   | 469.8298  |
|           mnasnet_100           | 128 | 4.1978  |  9.7492   |    60.5217     |   53.9337   | 462.3385  |
|        res2net101_26w_4s        | 64  | 25.9614 |  47.214   |      nan       |  144.0171   | 451.5853  |
|  swin_base_patch4_window7_224   | 64  |  12.4   |  26.9354  |      nan       |   82.8153   | 424.9892  |
|           regnety_002           | 128 | 4.9105  |  10.8595  |      nan       |   60.7642   | 413.4684  |
|            nfnet_l0             | 64  |  6.122  |  13.0335  |      nan       |   40.1358   | 407.7362  |
|         mobilenetv2_100         | 128 | 4.2405  |  9.2914   |      nan       |   43.8073   | 400.1731  |
|          convnext_base          | 32  | 12.0387 |  19.3516  |      nan       |   47.4608   | 400.0663  |
|        ese_vovnet19b_dw         | 128 | 2.0251  |  5.1077   |      nan       |   40.0498   | 397.4036  |
|         visformer_small         | 128 | 2.3605  |  6.7356   |      nan       |   32.1076   | 379.8813  |
|      xcit_large_24_p8_224       |  5  | 37.1179 |    nan    |      nan       |     nan     |  363.892  |
|      mobilenetv3_large_100      | 128 | 4.5824  |  10.1168  |      nan       |   86.4595   | 363.6031  |
|        gluon_xception65         | 32  | 15.4767 |  29.189   |      nan       |   78.4504   | 353.0086  |
|          jx_nest_base           | 32  | 9.7785  |  19.7435  |      nan       |   59.9364   | 327.2779  |
|          cait_m36_384           |  2  | 48.1901 |  71.6923  |      nan       |  107.6152   |  308.271  |
|         poolformer_m36          | 64  | 13.1268 |  21.8151  |      nan       |     nan     | 304.8976  |
|         crossvit_9_240          | 64  | 7.7826  |  17.0441  |      nan       |   42.6455   | 293.8263  |
|            gernet_l             | 128 | 4.9774  |  11.7724  |      nan       |   48.1579   | 285.8593  |
|           selecsls42b           | 128 | 2.4734  |  6.9182   |      nan       |   52.4839   | 275.3577  |
|          spnasnet_100           | 128 | 5.6856  |  12.2802  |    81.6643     |   61.8244   | 262.9874  |
|            lcnet_050            | 128 | 2.0093  |   5.267   |      nan       |   39.738    |  252.48   |
|       gluon_inception_v3        | 128 | 8.4342  |  18.8111  |      nan       |  107.1782   | 234.7551  |
|          inception_v3           | 128 | 8.4807  |  18.9464  |      nan       |  107.6097   | 223.2148  |
|     swsl_resnext101_32x16d      | 32  | 10.3929 |  22.1383  |      nan       |   63.0214   | 217.9656  |
|           volo_d1_224           | 64  |  6.874  |  16.0245  |      nan       |   45.317    | 210.5924  |
|           convit_base           | 32  | 4.1162  |  10.6181  |      nan       |     nan     | 190.8993  |
|            pit_b_224            | 64  | 3.9964  |   9.565   |      nan       |   27.7656   | 183.1499  |
|        tnt_s_patch16_224        | 64  | 12.6558 |  24.9605  |      nan       |   49.1967   |  166.85   |
|          gmlp_s16_224           | 64  | 9.7371  |  17.5417  |      nan       |   30.1711   | 149.1056  |
|            repvgg_a2            | 128 | 4.9392  |  10.6779  |      nan       |   65.977    | 139.9949  |
|          gmixer_24_224          | 64  | 8.6395  |  17.5991  |      nan       |   35.0491   | 131.8514  |
|           dm_nfnet_f0           | 128 | 6.6834  |  13.7387  |      nan       |   42.7846   | 128.7499  |
|          resmlp_12_224          | 128 |  2.834  |  6.1399   |     9.9394     |     nan     | 102.5117  |
|          mixer_b16_224          | 64  | 2.8878  |   7.212   |    16.3638     |   18.1333   | 100.1476  |
|        convmixer_768_32         | 32  |  7.064  |  14.717   |      nan       |   24.0237   |  85.642   |
|      beit_base_patch16_224      | 64  | 4.6764  |  10.6156  |      nan       |   22.3841   |  83.8798  |
| deit_base_distilled_patch16_224 | 64  | 3.1291  |   8.244   |      nan       |   16.9322   |  80.9149  |
|      vit_base_patch16_224       | 64  | 3.0647  |  7.9743   |      nan       |   16.4046   |  70.3833  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|          gmixer_24_224          | 64  | 1.0001 |  0.9563   |      nan       |   0.8998    |  1.2577  |
|          gmlp_s16_224           | 64  |  1.0   |  0.9679   |      nan       |    0.92     |  1.2405  |
|            tinynet_a            | 128 | 1.0001 |  0.7955   |      nan       |   0.7958    |  1.1632  |
|          pnasnet5large          | 16  | 1.0583 |  0.9923   |      nan       |   1.1741    |  1.1266  |
|        eca_halonext26ts         | 64  | 0.999  |  0.7814   |      nan       |    0.786    |  1.0889  |
|           dm_nfnet_f0           | 128 | 0.9758 |  0.9039   |      nan       |    0.95     |  1.0616  |
|        tnt_s_patch16_224        | 64  |  1.0   |  0.9718   |      nan       |   0.9431    |  1.0587  |
|           volo_d1_224           | 64  | 1.0015 |  0.9518   |      nan       |   0.8587    |  1.0378  |
|           convit_base           | 32  | 0.9991 |   0.86    |      nan       |     nan     |  1.0309  |
|      beit_base_patch16_224      | 64  | 0.9999 |  0.9367   |      nan       |   0.9298    |  1.0097  |
|           mobilevit_s           | 32  |  1.0   |  0.7722   |      nan       |    0.787    |  1.0078  |
|           rexnet_100            | 128 | 0.9988 |  0.7919   |      nan       |   0.8648    |  1.001   |
|             dla102              | 64  | 0.9998 |  0.9549   |      nan       |   0.9751    |  0.9969  |
|            pit_b_224            | 64  | 1.0021 |  0.8074   |      nan       |   0.8179    |  0.9856  |
|         poolformer_m36          | 64  | 1.0015 |  0.9462   |      nan       |     nan     |  0.9797  |
|          convnext_base          | 32  | 1.0065 |   0.908   |      nan       |   0.7521    |  0.9564  |
|        twins_pcpvt_base         | 32  | 0.9963 |  0.9079   |      nan       |   0.8007    |  0.9553  |
|        convmixer_768_32         | 32  | 0.9992 |  0.9807   |      nan       |   0.9715    |  0.9513  |
|         visformer_small         | 128 | 0.9899 |  0.9353   |      nan       |   0.8884    |  0.9341  |
|           resnest101e           | 32  | 1.0002 |  0.9762   |      nan       |   0.9535    |  0.9292  |
|           tf_mixnet_l           | 64  | 0.9995 |  0.8624   |      nan       |   0.8426    |  0.9291  |
|          mixer_b16_224          | 64  | 0.9929 |  0.9425   |     0.2532     |   0.7726    |  0.9225  |
|       tf_efficientnet_b0        | 128 | 1.0006 |  0.7769   |      nan       |    0.846    |  0.9189  |
|            nfnet_l0             | 64  | 0.9993 |   0.824   |      nan       |   0.8257    |  0.9132  |
|         mobilenetv2_100         | 128 | 0.9992 |  0.7716   |      nan       |   0.9249    |  0.8963  |
|      vit_base_patch16_224       | 64  | 0.9955 |  0.9384   |      nan       |   0.8801    |  0.8916  |
| deit_base_distilled_patch16_224 | 64  | 0.9944 |  0.9376   |      nan       |   0.8794    |  0.8911  |
|      mobilenetv3_large_100      | 128 | 0.9987 |  0.8562   |      nan       |   0.8673    |  0.8885  |
|        adv_inception_v3         | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|       gluon_inception_v3        | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|          inception_v3           | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|        gluon_xception65         | 32  |  1.0   |  0.8895   |      nan       |   0.8854    |  0.8712  |
|             dpn107              | 32  | 0.9981 |  0.9115   |      nan       |   0.8834    |   0.87   |
|           selecsls42b           | 128 | 0.9789 |  0.8913   |      nan       |   0.8811    |  0.866   |
|            fbnetv3_b            | 128 | 1.0003 |  0.7918   |      nan       |   0.7903    |  0.8647  |
|            mixnet_l             | 64  | 0.9989 |  0.8507   |      nan       |   0.7796    |  0.8601  |
|          spnasnet_100           | 128 | 0.9988 |  0.8961   |     0.1651     |   0.8371    |  0.8599  |
|       eca_botnext26ts_256       | 64  | 0.9998 |  0.7776   |      nan       |   0.7813    |  0.8533  |
|     swsl_resnext101_32x16d      | 32  | 1.0009 |  0.8805   |      nan       |   0.8487    |  0.8523  |
|      xcit_large_24_p8_224       |  5  | 0.9987 |    nan    |      nan       |     nan     |  0.8489  |
|          resmlp_12_224          | 128 | 0.9827 |  0.9667   |     0.2637     |     nan     |  0.845   |
|          ghostnet_100           | 128 | 1.0013 |  0.8903   |      nan       |   0.9244    |  0.833   |
|         coat_lite_mini          | 128 | 1.0338 |   0.929   |      nan       |   0.6593    |  0.8328  |
|        ese_vovnet19b_dw         | 128 |  1.0   |   0.867   |      nan       |   0.9146    |  0.8269  |
|          cspdarknet53           | 64  |  1.0   |  0.8467   |      nan       |   0.7906    |  0.813   |
|          cait_m36_384           |  2  | 0.9998 |  0.8806   |      nan       |   0.9023    |  0.8081  |
|          jx_nest_base           | 32  |  1.0   |  0.8945   |      nan       |    0.86     |   0.8    |
|         crossvit_9_240          | 64  | 1.0008 |  0.8801   |      nan       |   0.8854    |  0.7934  |
|        res2net101_26w_4s        | 64  | 0.9999 |  0.9202   |      nan       |   0.8569    |  0.7834  |
|           mnasnet_100           | 128 | 0.9993 |  0.8882   |     0.1669     |   0.8253    |  0.773   |
|  swin_base_patch4_window7_224   | 64  | 0.9998 |  0.9234   |      nan       |   0.8451    |  0.7676  |
|        sebotnet33ts_256         | 64  | 0.9999 |  0.7108   |      nan       |   0.7354    |  0.7449  |
|            gernet_l             | 128 | 0.9998 |  0.8655   |      nan       |    0.83     |  0.7238  |
|           fbnetc_100            | 128 | 0.9984 |  0.8631   |     0.1626     |   0.7352    |  0.7104  |
|            lcnet_050            | 128 | 0.9992 |  0.7927   |      nan       |   0.7885    |  0.705   |
|           regnety_002           | 128 | 0.9994 |  0.8284   |      nan       |   0.7819    |  0.6971  |
|          botnet26t_256          | 128 |  1.0   |  0.8755   |      nan       |    0.78     |  0.6615  |
|           res2next50            |  2  |  1.0   |  0.8301   |      nan       |   0.8198    |  0.6012  |
|        res2net50_14w_8s         |  2  |  1.0   |  0.8275   |      nan       |   0.8169    |  0.5927  |
|            hrnet_w18            |  2  |  1.0   |  0.8383   |      nan       |   0.8363    |  0.5746  |
|            repvgg_a2            | 128 | 1.0003 |  0.7971   |      nan       |   0.6902    |  0.5572  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------------+-------------+-------------+-------------+
|    Compiler    | torchbench  | huggingface | timm_models |
+----------------+-------------+-------------+-------------+
|     eager      | 100%, 55/55 | 93%, 41/44  | 100%, 61/61 |
|   aot_eager    | 98%, 54/55  | 93%, 41/44  | 90%, 55/61  |
| aot_cudagraphs | 29%, 16/55  |  0%, 0/44   |  0%, 0/61   |
|  aot_nvfuser   | 62%, 34/55  |  2%, 1/44   | 82%, 50/61  |
|    inductor    | 87%, 48/55  | 77%, 34/44  | 74%, 45/61  |
+----------------+-------------+-------------+-------------+

Geometric mean speedup

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   1.00x    |    1.01x    |    1.00x    |
|   aot_eager    |   1.01x    |    1.00x    |    1.00x    |
| aot_cudagraphs |   1.02x    |    0.0x     |    0.0x     |
|  aot_nvfuser   |   1.12x    |    1.13x    |    1.12x    |
|    inductor    |   1.37x    |    1.61x    |    1.24x    |
+----------------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |    5.70    |    13.73    |    11.39    |
|   aot_eager    |   10.34    |    20.46    |    17.09    |
| aot_cudagraphs |    4.54    |     0.0     |     0.0     |
|  aot_nvfuser   |   21.31    |    10.74    |    57.51    |
|    inductor    |   265.33   |   111.78    |   417.22    |
+----------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   0.96x    |    0.98x    |    1.00x    |
|   aot_eager    |   0.87x    |    0.88x    |    0.88x    |
| aot_cudagraphs |   0.48x    |    0.0x     |    0.0x     |
|  aot_nvfuser   |   0.84x    |    1.08x    |    0.85x    |
|    inductor    |   0.79x    |    0.74x    |    0.89x    |
+----------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            densenet121            |  4   | 1.0021 |  1.0072   |      0.0       |   1.4515    |  4.6393  |
|         timm_efficientdet         |  1   | 0.9831 |  0.8908   |      0.0       |     0.0     |  3.8674  |
|       functorch_dp_cifar10        |  64  | 1.0019 |  0.9777   |      0.0       |   1.1919    |  3.6153  |
|      timm_vision_transformer      |  8   | 1.003  |   0.923   |      0.0       |   1.3434    |  2.5786  |
|                drq                |  1   | 0.9972 |  0.8497   |      0.0       |   1.0702    |  2.4508  |
|           BERT_pytorch            |  16  | 1.0091 |  0.8721   |      0.0       |     0.0     |  1.855   |
|             resnet18              |  16  | 1.003  |  1.1147   |      0.0       |   1.4051    |  1.7636  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9993 |   0.938   |     1.1197     |   1.1919    |  1.729   |
|          pytorch_struct           | 200  | 0.9961 |  0.7502   |     0.8973     |    0.884    |  1.7059  |
|           lennard_jones           | 1000 | 0.9674 |  0.8486   |     1.0724     |   1.0278    |  1.667   |
|             hf_Albert             |  8   | 1.0012 |   0.995   |      0.0       |     0.0     |  1.6645  |
|           squeezenet1_1           |  32  | 0.9972 |  1.0037   |     0.9904     |   1.1563    |  1.6496  |
|               dcgan               |  32  | 0.9915 |  1.0198   |     1.109      |   1.1794    |  1.6235  |
|        speech_transformer         |  32  | 1.0078 |  0.9013   |      0.0       |     0.0     |  1.4912  |
|            timm_nfnet             | 128  | 0.9995 |  1.0004   |      0.0       |   1.2113    |  1.4741  |
|              hf_GPT2              |  4   | 1.0129 |  0.9793   |      0.0       |     0.0     |  1.4269  |
|            hf_T5_large            |  2   | 1.0232 |  0.9244   |      0.0       |     0.0     |  1.4038  |
|          resnext50_32x4d          |  8   | 1.0017 |  1.0845   |      0.0       |   1.3674    |  1.4019  |
|           fastNLP_Bert            |  6   | 0.9991 |  0.9746   |      0.0       |     0.0     |  1.3537  |
|        mobilenet_v3_large         |  32  | 1.0051 |  1.1141   |      0.0       |   1.3888    |  1.343   |
|         soft_actor_critic         | 256  | 0.9997 |  0.7922   |     1.0271     |   1.0222    |  1.2641  |
|          LearningToPaint          |  96  | 1.0027 |  1.0327   |      0.0       |   1.2377    |  1.262   |
|           pytorch_unet            |  1   | 0.9997 |  0.9987   |      0.0       |   1.0754    |  1.203   |
|              hf_Bart              |  4   | 1.0137 |  0.9696   |      0.0       |     0.0     |  1.1822  |
|               vgg16               |  64  | 0.9999 |  0.9984   |     0.7922     |   0.9965    |  1.1723  |
|            Super_SloMo            |  6   | 1.0001 |  0.9977   |      0.0       |     0.0     |  1.1704  |
|              alexnet              | 128  | 0.9993 |  0.9977   |     0.7784     |   1.0005    |  1.1646  |
|              hf_Bert              |  4   | 1.0249 |  1.0019   |      0.0       |     0.0     |  1.1577  |
|           hf_DistilBert           |  8   | 1.0009 |  0.9543   |      0.0       |     0.0     |  1.1516  |
|        shufflenet_v2_x1_0         | 128  | 1.0001 |  1.0777   |      0.0       |   1.2258    |  1.1504  |
|            mnasnet1_0             |  32  | 1.0009 |   1.123   |     0.748      |   1.3056    |  1.1302  |
|          pytorch_stargan          |  16  | 0.9995 |  0.9825   |     0.7291     |   0.9891    |  1.1176  |
|        Background_Matting         |  4   | 0.9996 |  1.0224   |      0.0       |   1.0822    |  1.1164  |
|            hf_Reformer            |  4   | 0.9965 |    0.0    |     0.894      |     0.0     |  1.1094  |
|         timm_efficientnet         |  32  | 0.9572 |   0.818   |      0.0       |   1.0643    |  1.095   |
|            hf_BigBird             |  2   | 0.9932 |  0.9458   |      0.0       |     0.0     |  1.0781  |
|   timm_vision_transformer_large   |  8   | 0.9994 |   0.994   |      0.0       |   0.9828    |  1.052   |
| attention_is_all_you_need_pytorch | 256  | 0.997  |  0.9694   |      0.0       |     0.0     |  1.0474  |
|           timm_resnest            |  32  | 0.9994 |   1.002   |      0.0       |   1.1837    |  1.0351  |
|              demucs               |  4   | 0.9998 |  0.9992   |     1.0002     |   0.9996    |  0.9995  |
|    mobilenet_v2_quantized_qat     |  96  | 0.9993 |  0.9991   |     0.9986     |   0.9989    |  0.9984  |
|      resnet50_quantized_qat       |  32  | 0.9972 |   0.998   |     0.9985     |    0.998    |  0.998   |
|            tts_angular            |  64  | 0.9963 |   0.96    |     0.9962     |   0.9982    |  0.9919  |
|               dlrm                | 2048 | 1.0936 |   0.932   |      0.0       |     0.0     |  0.9396  |
|            timm_vovnet            |  32  | 0.9057 |  0.9046   |      0.0       |   0.9795    |  0.9172  |
|      nvidia_deeprecommender       | 256  | 0.9994 |  0.9628   |     0.5849     |   0.9423    |  0.9044  |
|           mobilenet_v2            |  96  | 0.9996 |  0.9984   |      0.0       |   1.0439    |  0.865   |
|               moco                |  32  | 0.9926 |   1.045   |      0.0       |     0.0     |  0.8381  |
|             resnet50              |  32  | 0.9984 |  0.9932   |      0.0       |   1.1621    |  0.7785  |
|            timm_regnet            |  32  | 0.9649 |  0.9625   |      0.0       |   1.0943    |  0.7707  |
|              yolov3               |  16  | 0.9995 |  0.9943   |      0.0       |   1.1829    |   0.0    |
|           hf_Longformer           |  2   | 0.9693 |   0.901   |     0.8158     |     0.0     |   0.0    |
|               hf_T5               |  8   | 1.0007 |  0.9899   |      0.0       |     0.0     |   0.0    |
|           hf_GPT2_large           |  4   | 0.9996 |  0.9801   |      0.0       |     0.0     |   0.0    |
|             tacotron2             |  64  | 0.9808 |  0.8586   |      0.0       |     0.0     |   0.0    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          LearningToPaint          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            densenet121            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|                drq                |  1  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           mobilenet_v2            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           pytorch_unet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet18              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet50              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          resnext50_32x4d          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|         timm_efficientnet         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_regnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           timm_resnest            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|      timm_vision_transformer      |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_vovnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            Super_SloMo            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           fastNLP_Bert            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|             hf_Albert             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bert              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_BigBird             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_DistilBert           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|        Background_Matting         |  4  |       pass       |       pass       |   fail_to_run    |       pass       |   fail_to_run    |
|           hf_Longformer           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|               moco                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|             tacotron2             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|              yolov3               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      0.0000      |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|         timm_efficientdet         |  1   | 51.5766 |  70.6045  |      nan       |     nan     | 1764.6785 |
|            densenet121            |  4   | 13.3012 |  25.1495  |      nan       |   99.4384   | 1532.0131 |
|            hf_T5_large            |  2   | 35.6569 |  66.2515  |      nan       |     nan     | 1068.5147 |
|            mnasnet1_0             |  32  | 3.1714  |  6.9425   |    24.1489     |   33.4974   | 843.7609  |
|        mobilenet_v3_large         |  32  | 3.5883  |   7.421   |      nan       |   55.4373   | 787.0648  |
|               moco                |  32  | 11.1404 |  16.7881  |      nan       |     nan     | 677.8514  |
|           mobilenet_v2            |  96  | 3.0986  |  6.6705   |      nan       |   39.0118   | 623.2404  |
|          resnext50_32x4d          |  8   | 3.3002  |  7.4339   |      nan       |   30.9222   | 591.0255  |
|         timm_efficientnet         |  32  | 5.7511  |  10.4823  |      nan       |   56.0236   | 539.9275  |
|        shufflenet_v2_x1_0         | 128  | 3.5859  |  8.0994   |      nan       |   29.641    | 449.5994  |
|           squeezenet1_1           |  32  | 0.6202  |  1.3239   |     3.539      |    4.885    | 366.1201  |
|           timm_resnest            |  32  | 1.3364  |  3.5203   |      nan       |   35.8046   | 348.1886  |
|            timm_regnet            |  32  | 8.1136  |  14.0954  |      nan       |   53.1497   | 317.6042  |
|            timm_vovnet            |  32  | 2.9071  |  6.1334   |      nan       |   24.786    | 265.4777  |
| attention_is_all_you_need_pytorch | 256  |  4.266  |  10.1758  |      nan       |     nan     | 261.9771  |
|        speech_transformer         |  32  | 7.2245  |  13.6655  |      nan       |     nan     | 251.9521  |
|       functorch_dp_cifar10        |  64  | 0.7908  |  2.0897   |      nan       |   5.4668    | 204.5091  |
|      timm_vision_transformer      |  8   | 2.9851  |  6.2629   |      nan       |   11.3289   | 196.1347  |
|          LearningToPaint          |  96  | 0.9587  |  2.4854   |      nan       |   24.429    | 189.1747  |
|             resnet18              |  16  | 0.9185  |  2.4438   |      nan       |   17.9014   | 185.4883  |
|   timm_vision_transformer_large   |  8   | 22.2284 |  34.3611  |      nan       |   44.8166   | 174.6423  |
|           BERT_pytorch            |  16  |  4.836  |  10.8222  |      nan       |     nan     | 174.2309  |
|              hf_Bart              |  4   | 7.2937  |  13.3922  |      nan       |     nan     | 150.9699  |
|             resnet50              |  32  | 3.2836  |  7.3932   |      nan       |   34.4205   |  145.403  |
|          pytorch_stargan          |  16  | 0.7907  |   2.763   |     9.5307     |   4.3293    | 145.2698  |
|           fastNLP_Bert            |  6   | 4.9808  |  10.0575  |      nan       |     nan     | 142.9017  |
|        Background_Matting         |  4   | 3.6956  |  7.4423   |      nan       |   32.1955   |  141.231  |
|              hf_GPT2              |  4   | 3.5631  |   8.387   |      nan       |     nan     | 139.3171  |
|            timm_nfnet             | 128  | 6.4912  |  11.9484  |      nan       |   34.2804   | 136.1473  |
|          pytorch_struct           | 200  | 0.4001  |  0.9359   |     1.4509     |   4.2146    |  103.788  |
|            Super_SloMo            |  6   |  2.116  |  5.8313   |      nan       |     nan     |  86.5013  |
|             hf_Albert             |  8   | 1.0841  |  5.7737   |      nan       |     nan     |  79.1676  |
|              hf_Bert              |  4   | 4.9073  |  9.6611   |      nan       |     nan     |  76.0375  |
|            hf_Reformer            |  4   |  3.011  |    nan    |    13.0912     |     nan     |  73.2447  |
|            hf_BigBird             |  2   | 10.8878 |  16.7952  |      nan       |     nan     |  58.7916  |
|           pytorch_unet            |  1   | 1.0526  |  2.7433   |      nan       |   20.291    |  56.5606  |
|           hf_DistilBert           |  8   | 1.6504  |  3.9743   |      nan       |     nan     |  49.8976  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.7386  |   2.59    |     7.9453     |   4.1358    |  31.9732  |
|               vgg16               |  64  | 0.3239  |  0.7723   |     2.3694     |   2.6377    |  19.7724  |
|                drq                |  1   | 0.2568  |  0.5426   |      nan       |    3.49     |  19.6268  |
|               dlrm                | 2048 | 0.5936  |  0.9576   |      nan       |     nan     |  17.1468  |
|              alexnet              | 128  | 0.2564  |  0.5024   |     1.1934     |   2.4487    |  15.6905  |
|               dcgan               |  32  | 0.2503  |  0.5086   |     1.2065     |    3.791    |  15.4419  |
|      nvidia_deeprecommender       | 256  |  0.255  |  0.4785   |     0.7806     |   2.4561    |  11.5989  |
|         soft_actor_critic         | 256  | 0.2525  |  0.3811   |     0.6593     |   1.5779    |  10.3899  |
|           lennard_jones           | 1000 | 0.2231  |   0.362   |     0.5064     |   1.1272    |  5.2309   |
|            tts_angular            |  64  | 0.3078  |   0.363   |     0.4981     |   1.0814    |  4.2127   |
|      resnet50_quantized_qat       |  32  | 2.4789  |  2.5093   |     2.5295     |   2.4749    |  2.4968   |
|    mobilenet_v2_quantized_qat     |  96  | 2.3837  |  2.3536   |     2.377      |   2.3057    |  2.2628   |
|              demucs               |  4   |  0.802  |  0.8072   |     0.8072     |   0.7996    |  0.7216   |
|              yolov3               |  16  | 7.2552  |  13.1212  |      nan       |   47.2727   |    nan    |
|           hf_Longformer           |  2   | 11.3734 |  19.0144  |    90.6872     |     nan     |    nan    |
|           hf_GPT2_large           |  4   | 21.1646 |  35.4272  |      nan       |     nan     |    nan    |
|             tacotron2             |  64  | 14.0298 |  26.6327  |      nan       |     nan     |    nan    |
|               hf_T5               |  8   | 3.8362  |  10.6544  |      nan       |     nan     |    nan    |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            Super_SloMo            |  6   | 1.0024 |   0.956   |      nan       |     nan     |  1.1857  |
|         timm_efficientnet         |  32  | 0.9998 |  0.7704   |      nan       |   0.7845    |  1.0652  |
|            timm_nfnet             | 128  | 0.9393 |   0.897   |      nan       |   0.9515    |  1.022   |
|         timm_efficientdet         |  1   | 1.0142 |  0.8251   |      nan       |     nan     |  1.0218  |
|      resnet50_quantized_qat       |  32  | 0.9967 |  0.9967   |     0.9967     |   0.9967    |  1.0001  |
|    mobilenet_v2_quantized_qat     |  96  | 0.9957 |  0.9957   |     0.9957     |   0.9957    |  0.9992  |
|           mobilenet_v2            |  96  | 0.9993 |  0.7661   |      nan       |   0.7676    |  0.9975  |
|              demucs               |  4   | 0.9886 |  0.9886   |     0.9886     |   0.9886    |  0.9886  |
|            tts_angular            |  64  | 0.9884 |  0.9884   |     0.984      |   0.9884    |  0.9842  |
|              hf_GPT2              |  4   | 0.9548 |   0.887   |      nan       |     nan     |  0.9505  |
|        Background_Matting         |  4   | 1.0026 |   0.952   |      nan       |   0.9773    |  0.9139  |
|          pytorch_stargan          |  16  | 0.9975 |   1.019   |     0.2027     |   1.0085    |  0.9023  |
|        speech_transformer         |  32  | 0.9988 |  0.9152   |      nan       |     nan     |  0.896   |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9986 |  0.9194   |     0.2326     |   0.9141    |  0.8941  |
|             hf_Albert             |  8   | 0.9333 |  0.9333   |      nan       |     nan     |  0.8804  |
|           pytorch_unet            |  1   | 0.9985 |  0.8536   |      nan       |    0.851    |  0.859   |
|              hf_Bart              |  4   | 0.9617 |  0.8786   |      nan       |     nan     |  0.853   |
|              hf_Bert              |  4   | 0.9683 |  0.8952   |      nan       |     nan     |  0.8517  |
|            timm_regnet            |  32  | 1.0013 |  0.8634   |      nan       |   0.8806    |  0.8481  |
|        shufflenet_v2_x1_0         | 128  |  1.0   |  0.9163   |      nan       |   0.8868    |  0.8447  |
|           fastNLP_Bert            |  6   | 1.0012 |  0.9152   |      nan       |     nan     |  0.8343  |
| attention_is_all_you_need_pytorch | 256  | 0.9481 |  0.9241   |      nan       |     nan     |  0.8264  |
|            timm_vovnet            |  32  | 0.9933 |  0.7644   |      nan       |   0.7778    |  0.8252  |
|           BERT_pytorch            |  16  |  1.0   |  0.8995   |      nan       |     nan     |  0.825   |
|            hf_T5_large            |  2   | 0.922  |  0.8722   |      nan       |     nan     |  0.8237  |
|            hf_BigBird             |  2   | 0.9609 |  0.9609   |      nan       |     nan     |  0.8205  |
|           squeezenet1_1           |  32  | 0.9749 |  0.8159   |     0.2781     |   0.9742    |  0.8159  |
|           hf_DistilBert           |  8   | 0.9212 |  0.9053   |      nan       |     nan     |  0.7841  |
|               dcgan               |  32  |  1.0   |  0.7784   |     0.3321     |   0.7784    |  0.767   |
|               moco                |  32  | 1.0067 |  0.9701   |      nan       |     nan     |  0.767   |
|              alexnet              | 128  | 0.9998 |  0.7731   |     0.3805     |   0.7736    |  0.743   |
|            mnasnet1_0             |  32  | 0.9988 |  0.9087   |     0.1627     |   0.8348    |  0.7268  |
|             resnet50              |  32  | 1.0002 |  0.8763   |      nan       |   0.8011    |  0.7255  |
|   timm_vision_transformer_large   |  8   | 1.0022 |  0.8433   |      nan       |   0.8015    |  0.7222  |
|      timm_vision_transformer      |  8   |  1.0   |  0.8883   |      nan       |   0.8108    |  0.712   |
|        mobilenet_v3_large         |  32  | 0.9958 |  0.8655   |      nan       |   0.8773    |  0.7041  |
|               dlrm                | 2048 | 0.7282 |  0.7283   |      nan       |     nan     |  0.6973  |
|           timm_resnest            |  32  | 0.9935 |  0.8869   |      nan       |   0.8075    |  0.6861  |
|            densenet121            |  4   |  1.0   |  0.8812   |      nan       |   0.8571    |  0.6617  |
|          resnext50_32x4d          |  8   | 0.9994 |  0.8687   |      nan       |   0.8223    |  0.6614  |
|               vgg16               |  64  |  1.0   |  0.6663   |     0.2532     |   0.6664    |  0.6471  |
|          LearningToPaint          |  96  | 0.9442 |  0.7168   |      nan       |   0.6504    |  0.6444  |
|         soft_actor_critic         | 256  | 0.964  |   0.964   |     0.4356     |   0.9555    |  0.6428  |
|                drq                |  1   | 0.8541 |  0.8541   |      nan       |   0.8541    |  0.6427  |
|             resnet18              |  16  | 0.9846 |  0.7907   |      nan       |   0.7038    |  0.6163  |
|           lennard_jones           | 1000 |  1.0   |    1.0    |     0.3712     |   1.0947    |  0.5646  |
|      nvidia_deeprecommender       | 256  | 0.5598 |  0.5598   |     0.4734     |   0.5598    |  0.5598  |
|          pytorch_struct           | 200  |  1.0   |  0.5079   |     0.4824     |   0.5079    |  0.4222  |
|       functorch_dp_cifar10        |  64  | 0.9626 |  0.8251   |      nan       |   0.8254    |  0.4037  |
|            hf_Reformer            |  4   | 0.3011 |    nan    |     0.1803     |     nan     |  0.299   |
|              yolov3               |  16  | 1.0072 |  0.8533   |      nan       |   0.8915    |   nan    |
|           hf_Longformer           |  2   | 0.9603 |  0.9603   |     0.288      |     nan     |   nan    |
|             tacotron2             |  64  | 0.9922 |  1.1046   |      nan       |     nan     |   nan    |
|               hf_T5               |  8   | 0.9527 |  0.9446   |      nan       |     nan     |   nan    |
|           hf_GPT2_large           |  4   | 0.936  |  0.8771   |      nan       |     nan     |   nan    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|       MT5ForConditionalGeneration       | 2  | 1.027  |  0.9168   |      0.0       |     0.0     |  4.3687  |
|           ElectraForCausalLM            | 1  | 1.0453 |  0.9369   |      0.0       |     0.0     |  4.1923  |
|            YituTechConvBert             | 1  | 1.0289 |  0.9299   |      0.0       |     0.0     |  3.4016  |
|         MegatronBertForCausalLM         | 2  | 1.0372 |  0.9357   |      0.0       |     0.0     |  2.8899  |
|     M2M100ForConditionalGeneration      | 2  | 1.0114 |  0.9048   |      0.0       |     0.0     |  2.8587  |
|     MobileBertForQuestionAnswering      | 32 | 1.0194 |  0.9122   |      0.0       |     0.0     |  2.7823  |
|          MobileBertForMaskedLM          | 16 | 1.0195 |   0.903   |      0.0       |     0.0     |  2.6125  |
|             OPTForCausalLM              | 4  | 1.0186 |   0.897   |      0.0       |     0.0     |  2.5828  |
|           RobertaForCausalLM            | 4  | 1.0437 |  0.9334   |      0.0       |     0.0     |  2.5069  |
|             XGLMForCausalLM             | 1  | 1.0146 |  0.8742   |      0.0       |     0.0     |  2.4941  |
|                CamemBert                | 1  | 1.0435 |  0.9498   |      0.0       |     0.0     |  2.2953  |
|     PegasusForConditionalGeneration     | 4  | 1.0124 |  0.8918   |      0.0       |     0.0     |  2.0816  |
|               DistillGPT2               | 1  | 1.0299 |  0.9446   |      0.0       |     0.0     |  1.9655  |
|               GoogleFnet                | 1  | 1.0046 |  0.8137   |      0.0       |   1.1324    |  1.8265  |
|    MegatronBertForQuestionAnswering     | 8  | 1.0398 |  0.9391   |      0.0       |     0.0     |  1.7417  |
|     PLBartForConditionalGeneration      | 8  | 1.0168 |  0.9089   |      0.0       |     0.0     |  1.7175  |
|      GPT2ForSequenceClassification      | 4  | 0.9988 |  0.9775   |      0.0       |     0.0     |  1.6644  |
|      MBartForConditionalGeneration      | 8  | 1.0163 |  0.9134   |      0.0       |     0.0     |  1.4676  |
|            XLNetLMHeadModel             | 4  | 0.9998 |  0.9649   |      0.0       |     0.0     |  1.4274  |
|       T5ForConditionalGeneration        | 4  | 0.9982 |  0.9723   |      0.0       |     0.0     |  1.3487  |
|            TrOCRForCausalLM             | 8  | 1.0117 |  0.9445   |      0.0       |     0.0     |  1.3447  |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  1.0001   |      0.0       |     0.0     |  1.3067  |
|            AlbertForMaskedLM            | 2  | 1.0006 |  0.9979   |      0.0       |     0.0     |   1.3    |
|       DebertaForQuestionAnswering       | 4  | 0.9388 |  0.7464   |     0.794      |     0.0     |  1.2795  |
|    LayoutLMForSequenceClassification    | 16 | 0.9994 |  0.9881   |      0.0       |     0.0     |  1.2534  |
|         Speech2Text2ForCausalLM         | 64 | 1.0101 |  0.9381   |      0.0       |     0.0     |  1.2338  |
|                 T5Small                 | 1  | 1.022  |  0.9544   |      0.0       |     0.0     |  1.2217  |
|           PegasusForCausalLM            | 8  | 1.0118 |   0.92    |      0.0       |     0.0     |  1.2173  |
|      BartForConditionalGeneration       | 1  | 1.0142 |  0.9898   |      0.0       |     0.0     |  1.2117  |
|     DistilBertForQuestionAnswering      | 32 | 1.0293 |  0.9825   |      0.0       |     0.0     |  1.1948  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0107 |  0.9413   |      0.0       |     0.0     |  1.1946  |
|          DistilBertForMaskedLM          | 16 | 1.0288 |   0.978   |      0.0       |     0.0     |  1.1572  |
|            PLBartForCausalLM            | 16 | 1.0098 |  0.9437   |      0.0       |     0.0     |  1.1312  |
|             BartForCausalLM             | 2  | 0.9998 |  0.9666   |      0.0       |     0.0     |  1.1055  |
|       RobertaForQuestionAnswering       | 64 | 0.9986 |  0.9822   |      0.0       |     0.0     |  1.0941  |
|            MBartForCausalLM             | 16 |  1.01  |  0.9621   |      0.0       |     0.0     |  1.0884  |
|                 BigBird                 | 1  | 0.9892 |  0.9347   |      0.0       |     0.0     |  1.0879  |
|        BertForQuestionAnswering         | 64 | 0.9987 |   0.981   |      0.0       |     0.0     |  1.0865  |
|             BertForMaskedLM             | 64 | 0.9988 |  0.9623   |      0.0       |     0.0     |  1.0409  |
|           DebertaForMaskedLM            | 4  | 0.9388 |  0.8149   |     0.7231     |     0.0     |  1.0161  |
|       BlenderbotSmallForCausalLM        | 64 | 1.001  |  0.9085   |      0.0       |     0.0     |  1.008   |
|          AllenaiLongformerBase          | 1  | 0.9551 |  0.8695   |     0.7833     |     0.0     |   0.0    |
|       ElectraForQuestionAnswering       | 64 | 0.999  |  0.9853   |      0.0       |     0.0     |   0.0    |
|           LayoutLMForMaskedLM           | 16 | 0.9991 |  0.9699   |      0.0       |     0.0     |   0.0    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser |  inductor   |
+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+
|               GoogleFnet                | 1  |  pass  |   pass    |  fail_to_run   |    pass     |    pass     |
|             BartForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             BertForMaskedLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|        BertForQuestionAnswering         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                 BigBird                 | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                CamemBert                | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           DebertaForMaskedLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       DebertaForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|          DistilBertForMaskedLM          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     DistilBertForQuestionAnswering      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|               DistillGPT2               | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           ElectraForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       ElectraForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|      GPT2ForSequenceClassification      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           LayoutLMForMaskedLM           | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|    LayoutLMForSequenceClassification    | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            MBartForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       MT5ForConditionalGeneration       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|         MegatronBertForCausalLM         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|          MobileBertForMaskedLM          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     MobileBertForQuestionAnswering      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             OPTForCausalLM              | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            PLBartForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           PegasusForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     PegasusForConditionalGeneration     | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           RobertaForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       RobertaForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|         Speech2Text2ForCausalLM         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       T5ForConditionalGeneration        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                 T5Small                 | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            TrOCRForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            XLNetLMHeadModel             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            YituTechConvBert             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            AlbertForMaskedLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|       AlbertForQuestionAnswering        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|          AllenaiLongformerBase          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|      MBartForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|      BartForConditionalGeneration       | 0  | 0.0000 |  0.0000   |     0.0000     |   0.0000    |   0.0000    |
|     M2M100ForConditionalGeneration      | 0  | 0.0000 |  0.0000   |     0.0000     |   0.0000    |   0.0000    |
|             XGLMForCausalLM             | 0  | 0.0000 |  0.0000   |     0.0000     |   0.0000    |   0.0000    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|                  name                   | bs |  eager   | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|            XLNetLMHeadModel             | 4  | 17.8864  |  36.3251  |      nan       |     nan     | 317.1367 |
|          MobileBertForMaskedLM          | 16 | 135.2893 | 155.4058  |      nan       |     nan     | 271.7268 |
|     MobileBertForQuestionAnswering      | 32 | 133.5434 | 156.8102  |      nan       |     nan     | 252.473  |
|     M2M100ForConditionalGeneration      | 2  | 25.5586  |  37.8759  |      nan       |     nan     | 222.3239 |
|       MT5ForConditionalGeneration       | 2  |  6.4161  |  16.6703  |      nan       |     nan     | 179.1136 |
|            YituTechConvBert             | 1  |  8.9448  |  16.5143  |      nan       |     nan     | 176.3369 |
|       T5ForConditionalGeneration        | 4  |  3.7734  |  10.937   |      nan       |     nan     | 175.8095 |
|             XGLMForCausalLM             | 1  | 15.1297  |  24.758   |      nan       |     nan     | 168.856  |
|      MBartForConditionalGeneration      | 8  | 26.0665  |  38.8293  |      nan       |     nan     | 168.5349 |
|     PegasusForConditionalGeneration     | 4  | 25.6046  |  38.5024  |      nan       |     nan     | 157.4908 |
|           DebertaForMaskedLM            | 4  |  7.1369  |  13.2473  |    49.7312     |     nan     | 149.1406 |
|      BartForConditionalGeneration       | 1  | 25.5661  |  37.9984  |      nan       |     nan     | 148.0126 |
|    MegatronBertForQuestionAnswering     | 8  | 16.2073  |  25.7688  |      nan       |     nan     | 137.4447 |
|         MegatronBertForCausalLM         | 2  |  16.236  |  26.2057  |      nan       |     nan     | 136.8413 |
| BlenderbotSmallForConditionalGeneration | 32 | 11.9456  |  19.9424  |      nan       |     nan     | 134.1319 |
|                 T5Small                 | 1  |  3.7531  |   10.71   |      nan       |     nan     | 133.5697 |
|     PLBartForConditionalGeneration      | 8  |  7.3286  |   13.74   |      nan       |     nan     | 132.5687 |
|       DebertaForQuestionAnswering       | 4  |  6.9868  |  12.9985  |    50.6366     |     nan     | 114.6722 |
|           RobertaForCausalLM            | 4  |  5.2682  |  9.8593   |      nan       |     nan     | 100.9032 |
|    LayoutLMForSequenceClassification    | 16 |  5.1824  |  9.9437   |      nan       |     nan     | 92.2545  |
|           PegasusForCausalLM            | 8  |  9.8456  |  14.438   |      nan       |     nan     | 88.2178  |
|            MBartForCausalLM             | 16 |  9.8451  |  14.2179  |      nan       |     nan     | 85.4066  |
|             OPTForCausalLM              | 4  |  4.6586  |  9.5188   |      nan       |     nan     | 77.5511  |
|             BertForMaskedLM             | 64 |  4.9281  |  9.7456   |      nan       |     nan     |  77.007  |
|      GPT2ForSequenceClassification      | 4  |  3.4782  |  8.0937   |      nan       |     nan     | 76.4033  |
|             BartForCausalLM             | 2  |  9.6334  |   14.23   |      nan       |     nan     | 76.2828  |
|           ElectraForCausalLM            | 1  |  5.0797  |  9.7233   |      nan       |     nan     | 72.6091  |
|            TrOCRForCausalLM             | 8  |  10.038  |  14.4735  |      nan       |     nan     | 70.3343  |
|       BlenderbotSmallForCausalLM        | 64 |  4.7331  |  7.8131   |      nan       |     nan     | 68.4415  |
|         Speech2Text2ForCausalLM         | 64 |  3.1545  |  5.4563   |      nan       |     nan     | 65.9358  |
|               DistillGPT2               | 1  |  1.4438  |  3.7992   |      nan       |     nan     | 63.1728  |
|            PLBartForCausalLM            | 16 |  3.2604  |  5.7169   |      nan       |     nan     | 61.9116  |
|        BertForQuestionAnswering         | 64 |  4.8664  |  9.6553   |      nan       |     nan     | 60.3864  |
|     DistilBertForQuestionAnswering      | 32 |  1.7088  |  4.0654   |      nan       |     nan     | 60.3565  |
|                CamemBert                | 1  |  5.0927  |  9.6565   |      nan       |     nan     | 59.6166  |
|       RobertaForQuestionAnswering       | 64 |  4.8469  |  9.7659   |      nan       |     nan     |  59.427  |
|                 BigBird                 | 1  | 10.8289  |  16.7412  |      nan       |     nan     | 58.8768  |
|            AlbertForMaskedLM            | 2  |  1.2433  |  5.8391   |      nan       |     nan     | 56.5995  |
|       AlbertForQuestionAnswering        | 2  |  1.2235  |  5.7785   |      nan       |     nan     | 47.9866  |
|          DistilBertForMaskedLM          | 16 |  1.7344  |   4.111   |      nan       |     nan     | 46.6191  |
|               GoogleFnet                | 1  |  1.9789  |  4.2376   |      nan       |   10.744    |  42.907  |
|          AllenaiLongformerBase          | 1  | 11.4511  |  19.2509  |     86.117     |     nan     |   nan    |
|           LayoutLMForMaskedLM           | 16 |  5.5414  |  10.3348  |      nan       |     nan     |   nan    |
|       ElectraForQuestionAnswering       | 64 |  4.8934  |  9.6669   |      nan       |     nan     |   nan    |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|      GPT2ForSequenceClassification      | 4  | 0.9342 |  0.9091   |      nan       |     nan     |  1.0318  |
|            XLNetLMHeadModel             | 4  | 1.0001 |  0.8976   |      nan       |     nan     |  0.9717  |
|    LayoutLMForSequenceClassification    | 16 |  1.0   |  0.9348   |      nan       |     nan     |  0.9339  |
|        BertForQuestionAnswering         | 64 |  1.0   |  0.9467   |      nan       |     nan     |  0.9145  |
|       RobertaForQuestionAnswering       | 64 |  1.0   |  0.9467   |      nan       |     nan     |  0.9145  |
|                 T5Small                 | 1  |  1.0   |  0.9325   |      nan       |     nan     |  0.8445  |
|     DistilBertForQuestionAnswering      | 32 |  1.0   |  0.9046   |      nan       |     nan     |  0.8394  |
|             BertForMaskedLM             | 64 |  1.0   |  0.9219   |      nan       |     nan     |  0.8321  |
|             BartForCausalLM             | 2  |  1.0   |  0.8847   |      nan       |     nan     |  0.8303  |
|                 BigBird                 | 1  | 1.0001 |  0.9549   |      nan       |     nan     |  0.8224  |
|          DistilBertForMaskedLM          | 16 | 0.9998 |  0.9138   |      nan       |     nan     |  0.8055  |
|            PLBartForCausalLM            | 16 | 0.9997 |  0.8802   |      nan       |     nan     |  0.8028  |
|            MBartForCausalLM             | 16 |  1.0   |  0.8629   |      nan       |     nan     |  0.8005  |
|               DistillGPT2               | 1  | 1.0003 |  0.7721   |      nan       |     nan     |  0.7997  |
|         Speech2Text2ForCausalLM         | 64 |  1.0   |   0.88    |      nan       |     nan     |  0.7767  |
|       T5ForConditionalGeneration        | 4  |  1.0   |  0.9597   |      nan       |     nan     |  0.7754  |
|             XGLMForCausalLM             | 1  | 0.9999 |  0.9999   |      nan       |     nan     |  0.7728  |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8465   |      nan       |     nan     |  0.7708  |
| BlenderbotSmallForConditionalGeneration | 32 |  1.0   |  0.9036   |      nan       |     nan     |  0.7612  |
|     PLBartForConditionalGeneration      | 8  | 0.9997 |  0.8222   |      nan       |     nan     |  0.7547  |
|                CamemBert                | 1  | 0.998  |  0.7977   |      nan       |     nan     |  0.7369  |
|            YituTechConvBert             | 1  | 0.9858 |  0.7923   |      nan       |     nan     |  0.7298  |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.8048   |      nan       |     nan     |  0.7284  |
|       BlenderbotSmallForCausalLM        | 64 |  1.0   |  0.8401   |      nan       |     nan     |  0.7277  |
|      MBartForConditionalGeneration      | 8  |  1.0   |  0.8137   |      nan       |     nan     |  0.727   |
|             OPTForCausalLM              | 4  | 0.9979 |   0.75    |      nan       |     nan     |  0.714   |
|           RobertaForCausalLM            | 4  | 0.9058 |  0.7778   |      nan       |     nan     |  0.7099  |
|           PegasusForCausalLM            | 8  |  1.0   |  0.9323   |      nan       |     nan     |  0.7012  |
|    MegatronBertForQuestionAnswering     | 8  | 0.923  |  0.8265   |      nan       |     nan     |  0.6997  |
|               GoogleFnet                | 1  | 1.0003 |  0.9447   |      nan       |   1.0813    |  0.6953  |
|     M2M100ForConditionalGeneration      | 2  | 0.9795 |   0.979   |      nan       |     nan     |  0.6702  |
|         MegatronBertForCausalLM         | 2  | 0.7066 |  0.7066   |      nan       |     nan     |  0.6453  |
|     PegasusForConditionalGeneration     | 4  | 0.9721 |  0.9004   |      nan       |     nan     |  0.642   |
|       MT5ForConditionalGeneration       | 2  | 0.6173 |  0.6173   |      nan       |     nan     |  0.6173  |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.9369   |      nan       |     nan     |  0.6126  |
|           ElectraForCausalLM            | 1  |  1.0   |  0.9107   |      nan       |     nan     |  0.6123  |
|            AlbertForMaskedLM            | 2  | 0.9999 |  0.9172   |      nan       |     nan     |  0.6027  |
|          MobileBertForMaskedLM          | 16 | 0.9997 |  0.9179   |      nan       |     nan     |  0.5861  |
|     MobileBertForQuestionAnswering      | 32 |  1.0   |  0.9716   |      nan       |     nan     |  0.4668  |
|           DebertaForMaskedLM            | 4  |  1.0   |  0.9851   |     0.352      |     nan     |  0.4265  |
|       DebertaForQuestionAnswering       | 4  | 0.9845 |  1.0525   |     0.3277     |     nan     |  0.3569  |
|          AllenaiLongformerBase          | 1  | 0.9988 |  0.9515   |     0.3144     |     nan     |   nan    |
|       ElectraForQuestionAnswering       | 64 |  1.0   |  0.9524   |      nan       |     nan     |   nan    |
|           LayoutLMForMaskedLM           | 16 |  1.0   |  0.9409   |      nan       |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|        res2net50_14w_8s         |  2  | 0.9983 |   1.027   |      0.0       |   1.4439    |  4.7917  |
|            hrnet_w18            |  2  | 1.0076 |  1.0877   |      0.0       |   1.4906    |  4.6235  |
|           res2next50            |  2  | 1.0034 |  1.0445   |      0.0       |   1.3722    |  4.1476  |
|         coat_lite_mini          | 128 |  1.0   |  0.9994   |      0.0       |   1.0739    |  1.7094  |
|          ghostnet_100           | 128 | 0.9985 |  0.9939   |      0.0       |    1.249    |  1.5956  |
|        tnt_s_patch16_224        | 64  | 0.9997 |  0.9961   |      0.0       |   1.5683    |  1.5095  |
|        twins_pcpvt_base         | 32  | 1.0037 |  0.9738   |      0.0       |   1.3525    |  1.4376  |
|      xcit_large_24_p8_224       |  5  | 1.0006 |  0.9883   |      0.0       |     0.0     |  1.4149  |
|         crossvit_9_240          | 64  | 1.0049 |  0.9992   |      0.0       |   1.0961    |  1.405   |
|           volo_d1_224           | 64  | 0.9995 |  0.9952   |      0.0       |   1.1385    |  1.3979  |
|            nfnet_l0             | 64  | 0.9996 |  0.7979   |      0.0       |   1.0535    |  1.3819  |
|          gmixer_24_224          | 64  | 0.999  |  0.8428   |      0.0       |   0.9942    |  1.3536  |
|          jx_nest_base           | 32  | 0.9995 |  0.9942   |      0.0       |   1.2243    |  1.2913  |
|            lcnet_050            | 128 | 0.9564 |  0.9466   |      0.0       |   1.5001    |  1.2739  |
|           convit_base           | 32  | 0.9992 |  0.9931   |      0.0       |   1.1944    |  1.2661  |
|          convnext_base          | 32  | 0.9994 |   0.994   |      0.0       |   1.0411    |  1.2019  |
|          cait_m36_384           |  2  | 0.9981 |  0.9894   |      0.0       |   0.9966    |  1.196   |
|          gmlp_s16_224           | 64  | 0.9989 |  0.9964   |      0.0       |   0.9982    |  1.1454  |
|      beit_base_patch16_224      | 64  | 0.9998 |  0.9743   |      0.0       |   0.9541    |  1.1235  |
| deit_base_distilled_patch16_224 | 64  | 0.9997 |   0.998   |      0.0       |   1.0189    |  1.1047  |
|           regnety_002           | 128 | 0.9778 |  0.9883   |      0.0       |   1.3588    |  1.101   |
|      vit_base_patch16_224       | 64  | 0.9998 |  0.9982   |      0.0       |   0.9778    |  1.0942  |
|          mixer_b16_224          | 64  | 0.9997 |  0.9973   |      0.0       |   0.9836    |  1.0789  |
|           tf_mixnet_l           | 64  | 0.9714 |  0.8744   |      0.0       |   1.0062    |  1.0438  |
|          resmlp_12_224          | 128 | 0.9998 |  0.9997   |      0.0       |     0.0     |  1.0094  |
|            mixnet_l             | 64  | 0.9707 |  0.8727   |      0.0       |   1.0055    |  1.0017  |
|             dpn107              | 32  | 0.9584 |  0.9514   |      0.0       |    1.029    |  0.9988  |
|             dla102              | 64  | 0.9992 |  0.9967   |      0.0       |   1.2857    |  0.9897  |
|            gernet_l             | 128 | 0.9739 |   0.969   |      0.0       |   1.0979    |  0.9142  |
|           resnest101e           | 32  | 1.0011 |   1.018   |      0.0       |    1.204    |  0.9009  |
|            repvgg_a2            | 128 | 0.9634 |  0.9621   |      0.0       |   1.1211    |  0.8987  |
|           mobilevit_s           | 32  | 0.9749 |  0.7654   |      0.0       |   0.9566    |  0.8956  |
|         visformer_small         | 128 | 1.0001 |  1.0006   |      0.0       |   1.0204    |  0.8732  |
|           selecsls42b           | 128 | 0.9998 |  0.9983   |      0.0       |   1.2088    |  0.8727  |
|          cspdarknet53           | 64  | 0.9586 |  0.9504   |      0.0       |   1.1831    |  0.8635  |
|           mnasnet_100           | 128 | 0.9646 |  0.9634   |      0.0       |   1.1533    |  0.8582  |
|            fbnetv3_b            | 128 | 0.9648 |  0.9584   |      0.0       |   1.1334    |  0.8559  |
|        sebotnet33ts_256         | 64  | 0.9761 |  0.8072   |      0.0       |   1.0537    |  0.8532  |
|            tinynet_a            | 128 | 0.9662 |  0.7755   |      0.0       |   0.9712    |  0.8438  |
|      mobilenetv3_large_100      | 128 | 0.9659 |  0.9624   |      0.0       |   1.1625    |  0.793   |
|        res2net101_26w_4s        | 64  | 0.9987 |  0.9969   |      0.0       |   1.1757    |  0.7829  |
|       tf_efficientnet_b0        | 128 | 0.9763 |  0.7833   |      0.0       |   0.9849    |  0.7726  |
|          spnasnet_100           | 128 | 0.961  |  0.9581   |      0.0       |   1.1386    |  0.7679  |
|        eca_halonext26ts         | 64  | 0.9745 |  0.7769   |      0.0       |   1.0166    |  0.7612  |
|           fbnetc_100            | 128 | 0.9657 |  0.9619   |      0.0       |   1.1839    |  0.7582  |
|         mobilenetv2_100         | 128 | 0.9666 |  0.9604   |      0.0       |   1.0141    |  0.699   |
|       eca_botnext26ts_256       | 64  | 0.9736 |  0.7695   |      0.0       |   1.0172    |  0.6956  |
|           rexnet_100            | 128 | 0.9729 |  0.8138   |      0.0       |    0.983    |  0.6949  |
|        ese_vovnet19b_dw         | 128 | 0.9788 |  0.9775   |      0.0       |   1.1442    |  0.6341  |
|          botnet26t_256          | 128 | 0.9849 |   0.985   |      0.0       |   1.2249    |   0.0    |
|           dm_nfnet_f0           | 128 | 0.9998 |  0.9994   |      0.0       |   1.2112    |   0.0    |
|        adv_inception_v3         | 128 |  1.0   |  0.9987   |      0.0       |   1.1247    |   0.0    |
|          inception_v3           | 128 |  1.0   |  0.9982   |      0.0       |   1.1244    |   0.0    |
|       gluon_inception_v3        | 128 | 0.9999 |  0.9986   |      0.0       |   1.1222    |   0.0    |
|     swsl_resnext101_32x16d      | 32  | 0.9994 |  0.9989   |      0.0       |   1.1076    |   0.0    |
|          pnasnet5large          | 16  | 0.9988 |  0.9982   |      0.0       |   1.0821    |   0.0    |
|        convmixer_768_32         | 32  | 0.9998 |  0.9999   |      0.0       |    1.061    |   0.0    |
|            pit_b_224            | 64  | 0.9998 |  0.9976   |      0.0       |   1.0601    |   0.0    |
|        gluon_xception65         | 32  | 0.9992 |  0.9976   |      0.0       |   1.0409    |   0.0    |
|         poolformer_m36          | 64  | 0.9994 |  0.9985   |      0.0       |   1.0061    |   0.0    |
|  swin_base_patch4_window7_224   | 64  | 0.9998 |  0.9787   |      0.0       |   0.9982    |   0.0    |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|          convnext_base          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          gmixer_24_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          gmlp_s16_224           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|        adv_inception_v3         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          botnet26t_256          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        convmixer_768_32         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          cspdarknet53           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dla102              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dpn107              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        eca_halonext26ts         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            gernet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          ghostnet_100           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       gluon_inception_v3        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            hrnet_w18            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          inception_v3           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            lcnet_050            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            mixnet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         mobilenetv2_100         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           mobilevit_s           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            nfnet_l0             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          pnasnet5large          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           regnety_002           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net101_26w_4s        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net50_14w_8s         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           res2next50            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           rexnet_100            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           selecsls42b           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           tf_mixnet_l           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            tinynet_a            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         visformer_small         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      vit_base_patch16_224       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |
|      xcit_large_24_p8_224       | 2  | pass  | fail_accuracy |  fail_to_run   |  fail_to_run  |     pass      |
|        gluon_xception65         | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|         poolformer_m36          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|         coat_lite_mini          | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            pit_b_224            | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      | fail_accuracy |
|            fbnetv3_b            | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|           resnest101e           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy | fail_accuracy |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy | fail_accuracy |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|              name               | bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|            hrnet_w18            |  2  | 97.691  | 128.2568  |      nan       |  297.4634   | 1326.4957 |
|             dpn107              | 32  | 13.3213 |  24.7932  |      nan       |   87.0993   | 1248.9361 |
|           rexnet_100            | 128 | 6.4621  |  12.2169  |      nan       |  106.2673   | 954.6179  |
|        res2net50_14w_8s         |  2  | 19.923  |  34.3528  |      nan       |   87.1183   | 931.7786  |
|           mobilevit_s           | 32  | 5.6473  |  11.2171  |      nan       |   45.1615   | 830.5169  |
|            mixnet_l             | 64  | 13.4325 |  20.6526  |      nan       |   69.3248   | 755.6167  |
|       eca_botnext26ts_256       | 64  | 2.4512  |  6.3653   |      nan       |   49.7973   |  739.881  |
|          ghostnet_100           | 128 | 8.9586  |  16.0244  |      nan       |   65.523    | 667.2918  |
|            tinynet_a            | 128 |  7.716  |  13.3784  |      nan       |   67.6541   | 645.5261  |
|           fbnetc_100            | 128 |  5.434  |  10.4927  |      nan       |   50.2826   | 612.6355  |
|           resnest101e           | 32  | 26.5974 |  40.6878  |      nan       |   100.07    | 606.1691  |
|        twins_pcpvt_base         | 32  | 26.1593 |  36.7425  |      nan       |   69.8539   | 604.2703  |
|            fbnetv3_b            | 128 | 12.668  |  20.6589  |      nan       |   85.5758   | 582.8326  |
|         coat_lite_mini          | 128 | 3.1411  |  7.0172   |      nan       |   16.5631   | 571.6242  |
|           res2next50            |  2  | 7.3453  |  14.7597  |      nan       |   47.873    | 543.6359  |
|             dla102              | 64  | 10.6899 |  18.9579  |      nan       |   71.5604   | 512.4648  |
|           mnasnet_100           | 128 | 4.0287  |  7.8445   |      nan       |   40.2794   | 476.7122  |
|           tf_mixnet_l           | 64  | 13.6793 |  21.1891  |      nan       |   69.8174   | 473.6279  |
|        sebotnet33ts_256         | 64  | 3.7753  |  8.4122   |      nan       |   53.6861   | 472.7106  |
|        eca_halonext26ts         | 64  | 2.5729  |   6.504   |      nan       |   51.8466   | 455.4176  |
|          cspdarknet53           | 64  | 5.8183  |  11.2112  |      nan       |   52.2796   | 454.9286  |
|        res2net101_26w_4s        | 64  | 25.839  |  41.9511  |      nan       |  106.9702   | 414.5372  |
|       tf_efficientnet_b0        | 128 | 5.8682  |  10.6433  |      nan       |   65.7641   | 404.0946  |
|        ese_vovnet19b_dw         | 128 | 1.8725  |  4.1691   |      nan       |   31.8386   | 401.7781  |
|         mobilenetv2_100         | 128 | 4.1951  |  8.0722   |      nan       |   39.9188   | 346.6298  |
|          convnext_base          | 32  | 11.3503 |  16.231   |      nan       |   31.7707   | 334.6469  |
|           regnety_002           | 128 | 4.7306  |  9.0104   |      nan       |   49.7585   | 326.9695  |
|      xcit_large_24_p8_224       |  5  | 36.843  |  52.7637  |      nan       |     nan     |  322.862  |
|          jx_nest_base           | 32  | 9.6403  |  17.4674  |      nan       |   66.114    | 322.1634  |
|      mobilenetv3_large_100      | 128 | 4.3189  |   8.215   |      nan       |   67.2751   | 296.0159  |
|         visformer_small         | 128 | 2.2803  |  5.4314   |      nan       |   25.6553   | 293.7596  |
|          cait_m36_384           |  2  | 48.6937 |  65.4215  |      nan       |   92.2057   | 279.3734  |
|            gernet_l             | 128 | 4.7024  |  9.9219   |      nan       |   39.0823   |  252.524  |
|         crossvit_9_240          | 64  | 7.4244  |  13.9238  |      nan       |   32.9177   | 251.2102  |
|           selecsls42b           | 128 | 2.3137  |  5.5553   |      nan       |   40.3432   | 243.8417  |
|          spnasnet_100           | 128 | 5.3442  |  10.3948  |      nan       |   46.8295   | 227.5039  |
|            lcnet_050            | 128 | 1.9178  |  4.1662   |      nan       |   31.8492   | 219.0054  |
|           volo_d1_224           | 64  |  6.695  |  12.6315  |      nan       |   32.6511   | 192.8301  |
|           convit_base           | 32  | 3.8807  |  8.8518   |      nan       |   21.3229   | 187.4577  |
|          gmlp_s16_224           | 64  | 9.0829  |  14.1574  |      nan       |   21.2325   | 149.2961  |
|        tnt_s_patch16_224        | 64  | 11.8226 |  21.1815  |      nan       |   34.8234   | 140.3073  |
|          gmixer_24_224          | 64  | 8.2047  |  14.0553  |      nan       |   23.6592   | 132.0265  |
|            repvgg_a2            | 128 |  4.598  |   8.933   |      nan       |   46.5715   | 124.4128  |
|          resmlp_12_224          | 128 | 2.6661  |  4.8475   |      nan       |     nan     |  98.1064  |
|            nfnet_l0             | 64  | 5.9174  |  11.4931  |      nan       |   30.9432   |  96.3515  |
|          mixer_b16_224          | 64  | 2.6958  |  5.1905   |      nan       |   12.7396   |  94.3682  |
| deit_base_distilled_patch16_224 | 64  | 3.0897  |   6.374   |      nan       |   12.9275   |  84.8878  |
|      beit_base_patch16_224      | 64  | 4.6591  |  9.1219   |      nan       |   17.496    |  83.1654  |
|      vit_base_patch16_224       | 64  | 2.8722  |  6.2339   |      nan       |   11.5018   |  68.0847  |
|          pnasnet5large          | 16  | 59.4832 |  80.4982  |      nan       |  183.5509   |    nan    |
|        adv_inception_v3         | 128 |  8.161  |  15.6215  |      nan       |   74.7227   |    nan    |
|       gluon_inception_v3        | 128 | 8.2187  |  15.8574  |      nan       |   74.6038   |    nan    |
|          inception_v3           | 128 | 8.1272  |  15.7713  |      nan       |   74.2407   |    nan    |
|  swin_base_patch4_window7_224   | 64  | 11.989  |  21.809   |      nan       |   68.8397   |    nan    |
|        gluon_xception65         | 32  | 14.9902 |  24.9327  |      nan       |   55.4597   |    nan    |
|     swsl_resnext101_32x16d      | 32  | 10.0119 |  18.3546  |      nan       |   49.483    |    nan    |
|          botnet26t_256          | 128 | 2.4012  |  5.6863   |      nan       |   42.0424   |    nan    |
|           dm_nfnet_f0           | 128 | 6.5043  |  11.8243  |      nan       |   34.6682   |    nan    |
|         poolformer_m36          | 64  | 13.1099 |  19.2828  |      nan       |   34.655    |    nan    |
|        convmixer_768_32         | 32  | 6.8749  |  11.8401  |      nan       |   20.2715   |    nan    |
|            pit_b_224            | 64  | 3.6016  |  7.4214   |      nan       |   15.3574   |    nan    |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|          gmixer_24_224          | 64  | 0.9992 |  0.9684   |      nan       |   0.9825    |  1.3808  |
|            nfnet_l0             | 64  | 1.0008 |  0.8298   |      nan       |    0.813    |  1.2558  |
|            tinynet_a            | 128 |  1.0   |  0.7831   |      nan       |   0.7845    |  1.1735  |
|           rexnet_100            | 128 | 0.9992 |  0.7879   |      nan       |    0.871    |  1.1072  |
|           convit_base           | 32  | 1.0001 |  0.8879   |      nan       |   0.9506    |  1.068   |
|         mobilenetv2_100         | 128 | 0.9998 |  0.7664   |      nan       |   0.7679    |  1.0051  |
|           mobilevit_s           | 32  | 0.9999 |  0.7692   |      nan       |   0.7431    |  1.0011  |
|             dla102              | 64  | 0.9881 |  0.9181   |      nan       |   0.9541    |  1.001   |
|        eca_halonext26ts         | 64  | 0.9999 |  0.7717   |      nan       |   0.7731    |  0.9711  |
|       eca_botnext26ts_256       | 64  |  1.0   |  0.7705   |      nan       |   0.7679    |  0.9703  |
|           tf_mixnet_l           | 64  | 1.0001 |   0.861   |      nan       |   0.8605    |  0.9698  |
|          cait_m36_384           |  2  | 1.0001 |  0.9024   |      nan       |   0.9202    |  0.9451  |
|       tf_efficientnet_b0        | 128 | 0.9998 |  0.7727   |      nan       |   0.8426    |  0.9413  |
|          mixer_b16_224          | 64  | 0.9956 |  0.9615   |      nan       |   0.8644    |  0.9357  |
|      beit_base_patch16_224      | 64  |  1.0   |  0.9575   |      nan       |   0.8606    |  0.9272  |
|          gmlp_s16_224           | 64  |  1.0   |  0.9766   |      nan       |    0.966    |  0.9267  |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9469   |      nan       |   0.8229    |  0.915   |
|        tnt_s_patch16_224        | 64  | 1.0001 |  0.9752   |      nan       |   0.8518    |  0.9131  |
|           volo_d1_224           | 64  | 0.9999 |  0.9247   |      nan       |   0.7472    |  0.9124  |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9476   |      nan       |   0.8242    |  0.9095  |
|          spnasnet_100           | 128 | 1.0005 |  0.9207   |      nan       |   0.8496    |  0.9024  |
|           selecsls42b           | 128 | 0.9883 |  0.8982   |      nan       |   0.9039    |  0.8999  |
|            mixnet_l             | 64  | 0.9995 |  0.8486   |      nan       |   0.7938    |  0.8993  |
|      mobilenetv3_large_100      | 128 | 1.0002 |  0.8686   |      nan       |   0.8819    |  0.8982  |
|      xcit_large_24_p8_224       |  5  | 0.9999 |  0.9206   |      nan       |     nan     |  0.8952  |
|           resnest101e           | 32  |  1.0   |  0.9458   |      nan       |   0.9449    |  0.8922  |
|          ghostnet_100           | 128 | 0.9998 |  0.8872   |      nan       |    0.947    |  0.8888  |
|         visformer_small         | 128 | 0.9943 |  0.9442   |      nan       |   0.9475    |  0.8883  |
|            fbnetv3_b            | 128 | 0.9995 |  0.7866   |      nan       |   0.7861    |  0.8837  |
|             dpn107              | 32  | 0.9997 |  0.9285   |      nan       |   0.8949    |  0.8763  |
|          convnext_base          | 32  | 1.0001 |  0.9077   |      nan       |   0.7678    |  0.8762  |
|        twins_pcpvt_base         | 32  | 1.0002 |  0.9127   |      nan       |   0.8351    |  0.8723  |
|          cspdarknet53           | 64  |  1.0   |  0.8562   |      nan       |   0.8797    |  0.8624  |
|          jx_nest_base           | 32  | 1.0017 |   0.898   |      nan       |   0.7112    |  0.8574  |
|        ese_vovnet19b_dw         | 128 | 0.9999 |  0.8938   |      nan       |   0.9369    |  0.8467  |
|        sebotnet33ts_256         | 64  |  1.0   |  0.7109   |      nan       |   0.6852    |  0.841   |
|          resmlp_12_224          | 128 | 0.9893 |  0.9525   |      nan       |     nan     |  0.8169  |
|        res2net101_26w_4s        | 64  | 1.0001 |  0.9307   |      nan       |   0.8959    |  0.8167  |
|         crossvit_9_240          | 64  | 1.0001 |  0.8721   |      nan       |    0.729    |  0.8108  |
|           mnasnet_100           | 128 | 1.0003 |  0.9126   |      nan       |   0.8368    |  0.7984  |
|         coat_lite_mini          | 128 | 1.0049 |  0.8826   |      nan       |   0.7873    |   0.79   |
|            lcnet_050            | 128 | 1.0005 |  0.7721   |      nan       |   0.7722    |  0.7579  |
|           regnety_002           | 128 | 0.9981 |   0.829   |      nan       |   0.7759    |  0.7465  |
|            gernet_l             | 128 |  1.0   |  0.7965   |      nan       |   0.8012    |  0.727   |
|           fbnetc_100            | 128 | 0.9998 |  0.8597   |      nan       |   0.7507    |  0.7246  |
|            hrnet_w18            |  2  | 0.9986 |  0.8792   |      nan       |   0.8869    |  0.6089  |
|           res2next50            |  2  |  1.0   |  0.8353   |      nan       |   0.8404    |  0.606   |
|        res2net50_14w_8s         |  2  |  1.0   |  0.8387   |      nan       |   0.8474    |  0.5877  |
|            repvgg_a2            | 128 | 1.0003 |  0.8145   |      nan       |   0.6633    |  0.536   |
|          pnasnet5large          | 16  | 1.069  |   1.011   |      nan       |   1.2062    |   nan    |
|        convmixer_768_32         | 32  |  1.0   |  0.9868   |      nan       |   0.9807    |   nan    |
|           dm_nfnet_f0           | 128 | 0.9393 |   0.897   |      nan       |   0.9515    |   nan    |
|         poolformer_m36          | 64  | 1.0003 |  0.9533   |      nan       |   0.9368    |   nan    |
|        gluon_xception65         | 32  | 0.9999 |  0.9384   |      nan       |   0.9001    |   nan    |
|        adv_inception_v3         | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |   nan    |
|       gluon_inception_v3        | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |   nan    |
|          inception_v3           | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |   nan    |
|     swsl_resnext101_32x16d      | 32  | 1.0003 |  0.8983   |      nan       |   0.8684    |   nan    |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9309   |      nan       |    0.83     |   nan    |
|          botnet26t_256          | 128 |  1.0   |  0.8494   |      nan       |   0.7497    |   nan    |
|            pit_b_224            | 64  | 0.9992 |  0.7962   |      nan       |   0.6417    |   nan    |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Performance graphs

see more

bench_logs/timm_models_float32.png :

bench_logs/torchbench_float32.png :

bench_logs/huggingface_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+-----------+-------------+
| Compiler  | huggingface |
+-----------+-------------+
| aot_eager | 93%, 41/44  |
| inductor  | 64%, 28/44  |
+-----------+-------------+

Geometric mean speedup

+-----------+-------------+
| Compiler  | huggingface |
+-----------+-------------+
| aot_eager |    1.00x    |
| inductor  |    1.76x    |
+-----------+-------------+

Mean compilation time (seconds)

+-----------+-------------+
| Compiler  | huggingface |
+-----------+-------------+
| aot_eager |    20.82    |
| inductor  |    80.93    |
+-----------+-------------+

Peak memory footprint compression ratio (higher is better)

+-----------+-------------+
| Compiler  | huggingface |
+-----------+-------------+
| aot_eager |    0.88x    |
| inductor  |    0.74x    |
+-----------+-------------+

Metrics over time

see more

bench_logs/geomean_over_time.png :

bench_logs/passrate_over_time.png :

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+-----------+----------+
|                  name                   | bs | aot_eager | inductor |
+-----------------------------------------+----+-----------+----------+
|       MT5ForConditionalGeneration       | 2  |   0.912   |  4.6277  |
|           ElectraForCausalLM            | 1  |  0.9372   |  4.1844  |
|            YituTechConvBert             | 1  |  0.9314   |  3.7368  |
|         MegatronBertForCausalLM         | 2  |  0.9425   |  3.3657  |
|             OPTForCausalLM              | 4  |  0.9837   |  2.9643  |
|          MobileBertForMaskedLM          | 16 |  0.9291   |  2.9327  |
|           RobertaForCausalLM            | 4  |  0.9599   |  2.5863  |
|     M2M100ForConditionalGeneration      | 2  |  0.9524   |  2.544   |
|             XGLMForCausalLM             | 1  |  0.8789   |  2.4742  |
|     PegasusForConditionalGeneration     | 4  |  0.8936   |  2.4345  |
|     MobileBertForQuestionAnswering      | 32 |  0.9097   |  2.3948  |
|                CamemBert                | 1  |  0.9434   |  2.2449  |
|               GoogleFnet                | 1  |  0.8119   |  2.0603  |
|               DistillGPT2               | 1  |   0.934   |  1.9454  |
|    MegatronBertForQuestionAnswering     | 8  |   0.932   |  1.8596  |
|     PLBartForConditionalGeneration      | 8  |  0.9042   |  1.6688  |
|      MBartForConditionalGeneration      | 8  |  0.8875   |  1.4768  |
|            XLNetLMHeadModel             | 4  |  0.9655   |  1.427   |
|                 T5Small                 | 1  |  0.9592   |  1.358   |
|         Speech2Text2ForCausalLM         | 64 |  0.9438   |  1.2946  |
|     DistilBertForQuestionAnswering      | 32 |  0.9767   |  1.2753  |
|            TrOCRForCausalLM             | 8  |  0.9338   |  1.2341  |
|           PegasusForCausalLM            | 8  |  0.9351   |  1.2218  |
|      BartForConditionalGeneration       | 1  |  0.9916   |  1.2055  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.9314   |  1.1764  |
|       DebertaForQuestionAnswering       | 4  |  0.7412   |  1.1722  |
|          DistilBertForMaskedLM          | 16 |   0.98    |  1.163   |
|            PLBartForCausalLM            | 16 |  0.9466   |  1.1229  |
|             BartForCausalLM             | 2  |  0.9662   |  1.1018  |
|       RobertaForQuestionAnswering       | 64 |  0.9825   |  1.0993  |
|                 BigBird                 | 1  |  0.9386   |  1.0925  |
|        BertForQuestionAnswering         | 64 |  0.9818   |  1.0919  |
|            MBartForCausalLM             | 16 |  0.9638   |  1.0433  |
|       AlbertForQuestionAnswering        | 2  |  0.9998   |   0.0    |
|            AlbertForMaskedLM            | 2  |  0.9979   |   0.0    |
|    LayoutLMForSequenceClassification    | 16 |  0.9875   |   0.0    |
|       ElectraForQuestionAnswering       | 64 |   0.984   |   0.0    |
|      GPT2ForSequenceClassification      | 4  |  0.9756   |   0.0    |
|       T5ForConditionalGeneration        | 4  |  0.9709   |   0.0    |
|           LayoutLMForMaskedLM           | 16 |  0.9701   |   0.0    |
|             BertForMaskedLM             | 64 |  0.9612   |   0.0    |
|       BlenderbotSmallForCausalLM        | 64 |  0.9085   |   0.0    |
|          AllenaiLongformerBase          | 1  |  0.8731   |   0.0    |
|           DebertaForMaskedLM            | 4  |  0.8027   |   0.0    |
+-----------------------------------------+----+-----------+----------+

Accuracy

+-----------------------------------------+----+-----------+-------------+
|                  name                   | bs | aot_eager |  inductor   |
+-----------------------------------------+----+-----------+-------------+
|             BartForCausalLM             | 1  |   pass    |    pass     |
|             BertForMaskedLM             | 1  |   pass    |    pass     |
|        BertForQuestionAnswering         | 1  |   pass    |    pass     |
|                 BigBird                 | 1  |   pass    |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |   pass    |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |   pass    |    pass     |
|                CamemBert                | 1  |   pass    |    pass     |
|           DebertaForMaskedLM            | 1  |   pass    |    pass     |
|       DebertaForQuestionAnswering       | 1  |   pass    |    pass     |
|          DistilBertForMaskedLM          | 1  |   pass    |    pass     |
|     DistilBertForQuestionAnswering      | 1  |   pass    |    pass     |
|               DistillGPT2               | 1  |   pass    |    pass     |
|           ElectraForCausalLM            | 1  |   pass    |    pass     |
|       ElectraForQuestionAnswering       | 1  |   pass    |    pass     |
|      GPT2ForSequenceClassification      | 1  |   pass    |    pass     |
|               GoogleFnet                | 1  |   pass    |    pass     |
|           LayoutLMForMaskedLM           | 1  |   pass    |    pass     |
|    LayoutLMForSequenceClassification    | 1  |   pass    |    pass     |
|            MBartForCausalLM             | 1  |   pass    |    pass     |
|       MT5ForConditionalGeneration       | 1  |   pass    |    pass     |
|         MegatronBertForCausalLM         | 1  |   pass    |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |   pass    |    pass     |
|          MobileBertForMaskedLM          | 1  |   pass    |    pass     |
|     MobileBertForQuestionAnswering      | 1  |   pass    |    pass     |
|             OPTForCausalLM              | 1  |   pass    |    pass     |
|            PLBartForCausalLM            | 1  |   pass    |    pass     |
|           PegasusForCausalLM            | 1  |   pass    |    pass     |
|     PegasusForConditionalGeneration     | 1  |   pass    |    pass     |
|           RobertaForCausalLM            | 1  |   pass    |    pass     |
|       RobertaForQuestionAnswering       | 1  |   pass    |    pass     |
|         Speech2Text2ForCausalLM         | 1  |   pass    |    pass     |
|       T5ForConditionalGeneration        | 1  |   pass    |    pass     |
|                 T5Small                 | 1  |   pass    |    pass     |
|            TrOCRForCausalLM             | 1  |   pass    |    pass     |
|            XLNetLMHeadModel             | 1  |   pass    |    pass     |
|            YituTechConvBert             | 1  |   pass    |    pass     |
|            AlbertForMaskedLM            | 1  |   pass    | fail_to_run |
|       AlbertForQuestionAnswering        | 1  |   pass    | fail_to_run |
|          AllenaiLongformerBase          | 1  |   pass    | fail_to_run |
|      MBartForConditionalGeneration      | 1  |   pass    | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |   pass    | fail_to_run |
|      BartForConditionalGeneration       | 0  |  0.0000   |   0.0000    |
|     M2M100ForConditionalGeneration      | 0  |  0.0000   |   0.0000    |
|             XGLMForCausalLM             | 0  |  0.0000   |   0.0000    |
+-----------------------------------------+----+-----------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+----------+
|                  name                   | bs | aot_eager | inductor |
+-----------------------------------------+----+-----------+----------+
|          MobileBertForMaskedLM          | 16 | 161.9975  | 230.3942 |
|     MobileBertForQuestionAnswering      | 32 | 156.5321  | 229.1364 |
|     M2M100ForConditionalGeneration      | 2  |  36.8031  | 169.1486 |
|            XLNetLMHeadModel             | 4  |  36.3955  | 140.9739 |
|      MBartForConditionalGeneration      | 8  |  39.8095  | 135.2674 |
|             XGLMForCausalLM             | 1  |  25.104   | 133.6482 |
|      BartForConditionalGeneration       | 1  |  38.6818  | 127.4739 |
|     PegasusForConditionalGeneration     | 4  |  38.2538  | 119.5229 |
|       MT5ForConditionalGeneration       | 2  |  16.9923  | 111.013  |
|       DebertaForQuestionAnswering       | 4  |  13.4467  | 110.5823 |
|         MegatronBertForCausalLM         | 2  |  26.3919  | 108.5788 |
|    MegatronBertForQuestionAnswering     | 8  |  26.2115  | 106.7605 |
|     PLBartForConditionalGeneration      | 8  |  13.4489  | 91.7812  |
| BlenderbotSmallForConditionalGeneration | 32 |  20.3056  | 86.0149  |
|                 T5Small                 | 1  |  11.1881  | 80.8945  |
|            YituTechConvBert             | 1  |  16.8696  | 78.8642  |
|            TrOCRForCausalLM             | 8  |  14.528   | 68.5267  |
|             OPTForCausalLM              | 4  |  9.7842   | 64.4264  |
|            MBartForCausalLM             | 16 |  14.6384  |  60.153  |
|           PegasusForCausalLM            | 8  |  14.4482  | 60.0701  |
|             BartForCausalLM             | 2  |  14.2281  | 57.4828  |
|           RobertaForCausalLM            | 4  |  10.1821  | 57.4763  |
|           ElectraForCausalLM            | 1  |  9.8272   | 56.1576  |
|       RobertaForQuestionAnswering       | 64 |  9.9816   | 56.0056  |
|        BertForQuestionAnswering         | 64 |  9.7034   | 55.3096  |
|                CamemBert                | 1  |  9.9611   | 52.6892  |
|                 BigBird                 | 1  |  17.3435  | 52.4819  |
|         Speech2Text2ForCausalLM         | 64 |  5.4863   | 44.3714  |
|            PLBartForCausalLM            | 16 |  5.4581   | 41.5006  |
|          DistilBertForMaskedLM          | 16 |  4.3538   | 37.1118  |
|     DistilBertForQuestionAnswering      | 32 |  4.2162   | 34.2192  |
|               GoogleFnet                | 1  |  4.3239   | 33.2724  |
|               DistillGPT2               | 1  |  3.8532   | 32.1718  |
|          AllenaiLongformerBase          | 1  |  19.6187  |   nan    |
|           DebertaForMaskedLM            | 4  |  13.4496  |   nan    |
|       T5ForConditionalGeneration        | 4  |  11.1033  |   nan    |
|    LayoutLMForSequenceClassification    | 16 |  10.3767  |   nan    |
|           LayoutLMForMaskedLM           | 16 |  10.3325  |   nan    |
|             BertForMaskedLM             | 64 |  9.9551   |   nan    |
|       ElectraForQuestionAnswering       | 64 |  9.8843   |   nan    |
|      GPT2ForSequenceClassification      | 4  |  8.4326   |   nan    |
|       BlenderbotSmallForCausalLM        | 64 |  8.0904   |   nan    |
|            AlbertForMaskedLM            | 2  |  6.3343   |   nan    |
|       AlbertForQuestionAnswering        | 2  |  5.8626   |   nan    |
+-----------------------------------------+----+-----------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+-----------+----------+
|                  name                   | bs | aot_eager | inductor |
+-----------------------------------------+----+-----------+----------+
|            XLNetLMHeadModel             | 4  |  0.8976   |  0.9807  |
|        BertForQuestionAnswering         | 64 |  0.9467   |  0.9145  |
|       RobertaForQuestionAnswering       | 64 |  0.9467   |  0.9145  |
|                 T5Small                 | 1  |  0.9325   |  0.8445  |
|     DistilBertForQuestionAnswering      | 32 |  0.9046   |  0.8405  |
|          DistilBertForMaskedLM          | 16 |  0.9138   |  0.8391  |
|             BartForCausalLM             | 2  |  0.8847   |  0.8303  |
|           ElectraForCausalLM            | 1  |  0.9107   |  0.827   |
|                 BigBird                 | 1  |  0.9549   |  0.8224  |
|            PLBartForCausalLM            | 16 |  0.8802   |  0.8028  |
|            MBartForCausalLM             | 16 |  0.8629   |  0.8005  |
|               DistillGPT2               | 1  |  0.7721   |  0.7997  |
|         Speech2Text2ForCausalLM         | 64 |   0.88    |  0.7767  |
|     PLBartForConditionalGeneration      | 8  |  0.8222   |  0.7744  |
|             XGLMForCausalLM             | 1  |  0.9999   |  0.7728  |
|      BartForConditionalGeneration       | 1  |  0.8465   |  0.7708  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.9036   |  0.7612  |
|                CamemBert                | 1  |  0.7977   |  0.7369  |
|            YituTechConvBert             | 1  |  0.7923   |  0.7298  |
|            TrOCRForCausalLM             | 8  |  0.8048   |  0.7284  |
|      MBartForConditionalGeneration      | 8  |  0.8137   |  0.727   |
|             OPTForCausalLM              | 4  |   0.75    |  0.714   |
|           RobertaForCausalLM            | 4  |  0.7778   |  0.7099  |
|           PegasusForCausalLM            | 8  |  0.9323   |  0.7012  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8265   |  0.6997  |
|               GoogleFnet                | 1  |  0.9447   |  0.6953  |
|     M2M100ForConditionalGeneration      | 2  |  0.9801   |  0.6643  |
|         MegatronBertForCausalLM         | 2  |  0.7066   |  0.6453  |
|     PegasusForConditionalGeneration     | 4  |  0.9004   |  0.642   |
|       MT5ForConditionalGeneration       | 2  |  0.6173   |  0.6173  |
|          MobileBertForMaskedLM          | 16 |  0.9179   |  0.5861  |
|     MobileBertForQuestionAnswering      | 32 |  0.9716   |  0.4668  |
|       DebertaForQuestionAnswering       | 4  |  1.0525   |  0.3569  |
|           DebertaForMaskedLM            | 4  |  0.9851   |   nan    |
|       T5ForConditionalGeneration        | 4  |  0.9597   |   nan    |
|       ElectraForQuestionAnswering       | 64 |  0.9524   |   nan    |
|          AllenaiLongformerBase          | 1  |  0.9515   |   nan    |
|           LayoutLMForMaskedLM           | 16 |  0.9409   |   nan    |
|       AlbertForQuestionAnswering        | 2  |  0.9369   |   nan    |
|    LayoutLMForSequenceClassification    | 16 |  0.9348   |   nan    |
|             BertForMaskedLM             | 64 |  0.9219   |   nan    |
|            AlbertForMaskedLM            | 2  |  0.9172   |   nan    |
|      GPT2ForSequenceClassification      | 4  |  0.9091   |   nan    |
|       BlenderbotSmallForCausalLM        | 64 |  0.8401   |   nan    |
+-----------------------------------------+----+-----------+----------+

Performance graphs

see more

bench_logs/huggingface_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      | 98%, 52/53 | 98%, 42/43  | 100%, 61/61 |
|   aot_eager    | 98%, 52/53 | 98%, 42/43  | 90%, 55/61  |
| aot_cudagraphs | 28%, 15/53 |  2%, 1/43   |  10%, 6/61  |
|  aot_nvfuser   | 60%, 32/53 |  0%, 0/43   | 75%, 46/61  |
|    inductor    | 81%, 43/53 | 86%, 37/43  | 90%, 55/61  |
+----------------+------------+-------------+-------------+

Geometric mean speedup

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   1.00x    |    1.01x    |    1.00x    |
|   aot_eager    |   1.01x    |    1.00x    |    1.00x    |
| aot_cudagraphs |   1.09x    |    1.00x    |    1.00x    |
|  aot_nvfuser   |   1.16x    |    0.0x     |    1.20x    |
|    inductor    |   1.68x    |    2.20x    |    1.31x    |
+----------------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |    6.15    |    14.88    |    11.73    |
|   aot_eager    |   12.44    |    25.70    |    19.93    |
| aot_cudagraphs |   12.80    |    93.53    |    51.65    |
|  aot_nvfuser   |   29.54    |     0.0     |    79.13    |
|    inductor    |   258.47   |   118.80    |   452.93    |
+----------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   0.96x    |    0.98x    |    1.00x    |
|   aot_eager    |   0.85x    |    0.86x    |    0.88x    |
| aot_cudagraphs |   0.43x    |    0.38x    |    0.19x    |
|  aot_nvfuser   |   0.83x    |    0.0x     |    0.85x    |
|    inductor    |   0.77x    |    0.82x    |    0.89x    |
+----------------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            densenet121            |  4   | 1.0002 |  0.9102   |      0.0       |    1.397    |  5.0623  |
|       functorch_dp_cifar10        |  64  | 1.0015 |  0.9112   |      0.0       |   1.1939    |  4.737   |
|         timm_efficientdet         |  1   | 0.9848 |  0.8085   |      0.0       |     0.0     |  4.2687  |
|           BERT_pytorch            |  16  | 1.0107 |  0.8304   |      0.0       |     0.0     |  3.1041  |
|      timm_vision_transformer      |  8   | 1.0006 |   0.846   |      0.0       |   1.3541    |  3.0679  |
|                drq                |  1   | 1.0024 |  0.8093   |      0.0       |    1.106    |  2.9813  |
|             resnet18              |  16  | 1.0009 |   0.989   |      0.0       |   1.3483    |  2.6731  |
|               dcgan               |  32  | 0.9772 |  0.9046   |     1.1443     |   0.7307    |  2.6188  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.998  |  0.9303   |     1.4873     |   1.2113    |  2.5972  |
|             hf_Albert             |  8   | 1.0011 |  0.9552   |      0.0       |     0.0     |  2.3953  |
|           squeezenet1_1           |  32  | 0.9933 |  0.9562   |     1.337      |   1.1937    |  2.3116  |
|          resnext50_32x4d          |  8   | 1.0027 |  0.9499   |      0.0       |   1.3374    |  2.1943  |
|        mobilenet_v3_large         |  32  | 1.0042 |  1.0057   |      0.0       |    1.411    |  2.1611  |
|               hf_T5               |  8   | 0.9984 |  0.9446   |      0.0       |     0.0     |  2.1382  |
|            hf_T5_large            |  2   | 1.0172 |  0.8568   |      0.0       |     0.0     |  2.122   |
|          pytorch_struct           | 200  | 1.0012 |  0.7441   |     1.0266     |   0.9964    |  2.0323  |
|              hf_Bert              |  4   | 1.0336 |  0.8486   |      0.0       |     0.0     |  1.8828  |
|              hf_GPT2              |  4   | 1.017  |  0.9879   |      0.0       |     0.0     |  1.8574  |
|            mnasnet1_0             |  32  | 0.9986 |  1.0159   |     0.9193     |   1.4046    |  1.7708  |
|          LearningToPaint          |  96  | 1.0045 |  1.0023   |      0.0       |   1.3491    |  1.7422  |
|              hf_Bart              |  4   | 1.0155 |  0.8359   |      0.0       |     0.0     |  1.7211  |
|           lennard_jones           | 1000 | 0.9786 |  0.7278   |     1.2952     |   1.0447    |  1.5978  |
|         timm_efficientnet         |  32  | 0.9608 |  0.8133   |      0.0       |   1.1851    |  1.5685  |
| attention_is_all_you_need_pytorch | 256  | 1.0029 |  0.9032   |      0.0       |     0.0     |  1.5158  |
|         soft_actor_critic         | 256  | 1.011  |   0.707   |     1.2513     |   1.0703    |  1.4902  |
|           hf_DistilBert           |  8   | 1.0015 |   0.969   |      0.0       |     0.0     |  1.4765  |
|           fastNLP_Bert            |  6   | 1.0004 |  0.8861   |      0.0       |     0.0     |  1.4585  |
|        shufflenet_v2_x1_0         | 128  | 1.0011 |  1.0157   |      0.0       |   1.3391    |  1.3717  |
|           pytorch_unet            |  1   | 0.9999 |  0.9926   |      0.0       |   1.1552    |  1.3528  |
|            timm_nfnet             | 128  | 0.9997 |  0.9985   |      0.0       |   1.1712    |  1.3388  |
|          pytorch_stargan          |  16  | 0.9984 |  1.0165   |     0.8265     |   1.1173    |  1.3192  |
|            Super_SloMo            |  6   | 0.9997 |   0.996   |      0.0       |     0.0     |  1.2905  |
|               vgg16               |  64  | 0.9997 |  0.9978   |     0.7975     |   0.9952    |  1.2744  |
|        Background_Matting         |  4   | 0.9993 |  1.0175   |      0.0       |   1.1152    |  1.2167  |
|              alexnet              | 128  | 0.9993 |  0.9971   |     0.788      |   1.0029    |  1.2085  |
|           timm_resnest            |  32  | 0.9995 |  1.0217   |      0.0       |   1.3245    |  1.2011  |
|            hf_Reformer            |  4   | 0.9924 |  0.9996   |     0.9192     |     0.0     |  1.1589  |
|   timm_vision_transformer_large   |  8   | 0.9991 |  0.9895   |      0.0       |   0.9926    |  1.1581  |
|            hf_BigBird             |  2   | 0.9986 |  0.9103   |      0.0       |     0.0     |  1.1491  |
|            timm_vovnet            |  32  | 0.9212 |  0.8868   |      0.0       |   1.1273    |  1.1101  |
|               moco                |  32  | 0.9968 |    0.0    |      0.0       |     0.0     |  1.0487  |
|            tts_angular            |  64  | 0.9963 |  0.9382   |     0.9949     |   0.9984    |  1.0118  |
|              demucs               |  4   | 0.9985 |  1.0008   |     0.9996     |   0.9991    |  1.0012  |
|      nvidia_deeprecommender       | 256  | 0.9985 |  0.9955   |     0.6966     |   0.9787    |  0.9905  |
|           mobilenet_v2            |  96  | 0.9988 |  0.9875   |      0.0       |   0.9305    |  0.9033  |
|             resnet50              |  32  | 1.0012 |  1.0086   |      0.0       |   1.3687    |  0.8978  |
|            timm_regnet            |  32  | 0.9812 |  0.9369   |      0.0       |   1.2152    |  0.7564  |
|              yolov3               |  16  | 0.9986 |  0.9886   |      0.0       |   0.9097    |   0.0    |
|           hf_Longformer           |  2   | 0.9639 |  0.8829   |     0.8871     |     0.0     |   0.0    |
|               dlrm                | 2048 |  0.0   |  1.2025   |      0.0       |     0.0     |   0.0    |
|           hf_GPT2_large           |  4   | 0.9996 |  0.9898   |      0.0       |     0.0     |   0.0    |
|        speech_transformer         |  32  | 1.0047 |  0.8518   |      0.0       |     0.0     |   0.0    |
|             tacotron2             |  64  | 0.9796 |  0.7578   |      0.0       |     0.0     |   0.0    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|        Background_Matting         |  4  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          LearningToPaint          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            densenet121            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|                drq                |  1  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           mobilenet_v2            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           pytorch_unet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet18              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet50              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          resnext50_32x4d          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|         timm_efficientnet         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_regnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           timm_resnest            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|      timm_vision_transformer      |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_vovnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            Super_SloMo            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           fastNLP_Bert            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|             hf_Albert             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bert              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_BigBird             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_DistilBert           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_Longformer           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |   fail_to_run    |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|             tacotron2             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|               moco                |  2  |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |  fail_accuracy   |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |      0.0000      |
|              yolov3               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      0.0000      |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|         timm_efficientdet         |  1   | 52.6602 |  79.1055  |      nan       |     nan     | 1818.1052 |
|            hf_T5_large            |  2   | 36.3097 |  76.3336  |      nan       |     nan     | 1734.768  |
|            densenet121            |  4   | 13.5051 |  28.8872  |      nan       |   138.834   | 1386.3113 |
|            mnasnet1_0             |  32  | 3.3962  |  8.6081   |    43.4979     |   46.1998   | 826.0046  |
|        mobilenet_v3_large         |  32  | 3.7847  |  9.2689   |      nan       |   75.0785   | 748.7895  |
|          resnext50_32x4d          |  8   | 3.6093  |  9.3411   |      nan       |   39.3272   | 707.1618  |
|               moco                |  32  |  11.37  |    nan    |      nan       |     nan     |  683.29   |
|           mobilenet_v2            |  96  | 3.3085  |  8.3878   |      nan       |   43.4906   | 635.5209  |
|         timm_efficientnet         |  32  | 5.9808  |  12.6187  |      nan       |   73.2075   | 583.9074  |
|        shufflenet_v2_x1_0         | 128  | 3.8001  |  9.9697   |      nan       |   40.5941   | 398.4956  |
|           squeezenet1_1           |  32  | 0.6697  |  1.7814   |     6.7516     |   6.8886    | 385.1688  |
|            timm_nfnet             | 128  | 6.6658  |  13.5281  |      nan       |   42.693    |  365.85   |
|           timm_resnest            |  32  | 1.4486  |  4.3564   |      nan       |   43.3691   | 349.0304  |
|            timm_regnet            |  32  |  8.429  |  16.9052  |      nan       |   67.531    | 317.4367  |
| attention_is_all_you_need_pytorch | 256  | 4.4349  |  12.7302  |      nan       |     nan     | 251.7143  |
|            timm_vovnet            |  32  |  3.013  |  7.3585   |      nan       |   32.2961   | 228.4544  |
|   timm_vision_transformer_large   |  8   | 22.8303 |  40.7575  |      nan       |   59.3202   | 203.1062  |
|          LearningToPaint          |  96  | 1.0884  |  3.1644   |      nan       |   30.8535   | 196.1728  |
|       functorch_dp_cifar10        |  64  | 0.8334  |  2.5709   |      nan       |   6.5105    | 186.6153  |
|      timm_vision_transformer      |  8   | 3.1965  |  8.1056   |      nan       |   16.1745   | 185.5642  |
|           BERT_pytorch            |  16  | 5.1553  |  13.6741  |      nan       |     nan     | 183.0714  |
|             resnet18              |  16  | 0.9908  |  3.0796   |      nan       |   23.6609   | 178.4714  |
|             resnet50              |  32  | 3.4622  |  9.1981   |      nan       |   44.2773   | 167.5082  |
|           fastNLP_Bert            |  6   | 5.3662  |  12.8031  |      nan       |     nan     | 155.3766  |
|               hf_T5               |  8   | 3.9527  |  12.7903  |      nan       |     nan     | 152.7908  |
|        Background_Matting         |  4   | 4.0586  |  9.8569   |      nan       |   45.5208   |  137.705  |
|          pytorch_stargan          |  16  | 0.8563  |  3.2896   |     11.618     |   7.5638    | 137.6237  |
|              hf_Bart              |  4   | 7.5098  |  17.1193  |      nan       |     nan     | 136.6578  |
|              hf_GPT2              |  4   | 3.6623  |  9.9582   |      nan       |     nan     | 128.4494  |
|          pytorch_struct           | 200  | 0.4445  |  1.2679   |     1.8613     |   5.4827    | 121.6668  |
|            Super_SloMo            |  6   | 2.3013  |  7.0704   |      nan       |     nan     |  91.5593  |
|             hf_Albert             |  8   | 1.5093  |  8.5484   |      nan       |     nan     |  81.4949  |
|            hf_Reformer            |  4   |  3.183  |  5.8886   |    13.7829     |     nan     |  80.2307  |
|              hf_Bert              |  4   | 5.2581  |  12.5739  |      nan       |     nan     |  72.6783  |
|            hf_BigBird             |  2   | 11.8968 |  20.2146  |      nan       |     nan     |  66.169   |
|           pytorch_unet            |  1   | 1.1321  |  3.4759   |      nan       |   26.7798   |  61.8894  |
|           hf_DistilBert           |  8   | 1.7875  |   5.408   |      nan       |     nan     |  47.2585  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.8066  |   3.255   |    11.9549     |   5.2307    |  33.7224  |
|               vgg16               |  64  | 0.3752  |  1.1111   |     4.1704     |   3.7078    |  29.9942  |
|              alexnet              | 128  | 0.2813  |  0.6904   |     1.9896     |   3.2561    |  29.2225  |
|                drq                |  1   | 0.2865  |  0.7521   |      nan       |   4.4815    |  22.5066  |
|               dcgan               |  32  |  0.268  |  0.6336   |     1.8825     |   4.3205    |  17.1734  |
|      nvidia_deeprecommender       | 256  | 0.2933  |  0.6746   |     1.0105     |   2.9894    |  15.6936  |
|         soft_actor_critic         | 256  | 0.2749  |  0.4931   |     0.715      |   2.1025    |  14.6417  |
|           lennard_jones           | 1000 | 0.2403  |  0.5118   |     0.6931     |   1.5472    |  8.5416   |
|            tts_angular            |  64  | 0.3366  |  0.3937   |     0.5196     |   1.1651    |  4.0383   |
|              demucs               |  4   | 0.9022  |  0.8912   |     0.8836     |   0.8907    |   0.789   |
|              yolov3               |  16  | 7.4472  |  15.7484  |      nan       |   45.481    |    nan    |
|           hf_Longformer           |  2   | 11.7858 |  21.3262  |    90.6374     |     nan     |    nan    |
|           hf_GPT2_large           |  4   | 21.8976 |  41.7361  |      nan       |     nan     |    nan    |
|             tacotron2             |  64  | 13.9023 |  30.239   |      nan       |     nan     |    nan    |
|        speech_transformer         |  32  | 7.6548  |   17.41   |      nan       |     nan     |    nan    |
|               dlrm                | 2048 |   nan   |  1.2125   |      nan       |     nan     |    nan    |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|             hf_Albert             |  8   | 0.9814 |   0.936   |      nan       |     nan     |  1.1576  |
|            Super_SloMo            |  6   | 1.0024 |  0.9697   |      nan       |     nan     |  1.1385  |
|            timm_nfnet             | 128  | 0.9761 |  0.9043   |      nan       |   0.9504    |  1.0242  |
|            tts_angular            |  64  | 1.0015 |  1.0015   |     0.9866     |   1.0015    |  0.9908  |
| attention_is_all_you_need_pytorch | 256  | 0.9976 |  0.9403   |      nan       |     nan     |  0.9875  |
|              demucs               |  4   | 0.987  |   0.987   |     0.987      |    0.987    |  0.987   |
|         timm_efficientdet         |  1   | 1.0316 |  0.8425   |      nan       |     nan     |  0.9857  |
|           BERT_pytorch            |  16  | 0.9998 |  0.8819   |      nan       |     nan     |  0.9728  |
|         timm_efficientnet         |  32  | 0.9982 |  0.7762   |      nan       |   0.7936    |  0.9689  |
|              hf_GPT2              |  4   | 0.971  |  0.8627   |      nan       |     nan     |  0.9645  |
|        Background_Matting         |  4   | 1.0201 |  0.9679   |      nan       |    0.987    |  0.9244  |
|           mobilenet_v2            |  96  | 1.0001 |  0.7725   |      nan       |   0.9235    |  0.8856  |
|           pytorch_unet            |  1   | 0.9968 |  0.8677   |      nan       |   0.8518    |  0.8681  |
|           fastNLP_Bert            |  6   | 1.0013 |  0.8966   |      nan       |     nan     |  0.8661  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.8751   |     0.2642     |   0.8432    |  0.8602  |
|            hf_T5_large            |  2   | 0.8541 |  0.8541   |      nan       |     nan     |  0.8535  |
|           hf_DistilBert           |  8   | 0.9505 |  0.8806   |      nan       |     nan     |  0.8387  |
|              hf_Bert              |  4   | 0.9844 |  0.8677   |      nan       |     nan     |  0.8383  |
|            timm_regnet            |  32  | 0.9999 |  0.8483   |      nan       |    0.85     |  0.8361  |
|              hf_Bart              |  4   | 0.9099 |  0.8321   |      nan       |     nan     |  0.8151  |
|            hf_BigBird             |  2   | 0.9852 |  0.9787   |      nan       |     nan     |   0.81   |
|            timm_vovnet            |  32  | 0.9903 |  0.7754   |      nan       |   0.7817    |  0.7861  |
|               moco                |  32  | 0.9667 |    nan    |      nan       |     nan     |  0.782   |
|        shufflenet_v2_x1_0         | 128  | 1.0002 |   0.874   |      nan       |   0.8652    |  0.7812  |
|          pytorch_stargan          |  16  | 0.9929 |  0.9799   |     0.2149     |   0.8882    |  0.7783  |
|               dcgan               |  32  |  1.0   |  0.7949   |     0.343      |   0.7073    |  0.7527  |
|               vgg16               |  64  | 0.9998 |  0.7378   |     0.2978     |   0.7172    |  0.7491  |
|   timm_vision_transformer_large   |  8   | 0.9987 |  0.8365   |      nan       |   0.8491    |  0.7487  |
|              alexnet              | 128  | 1.0003 |  0.8082   |     0.4354     |    0.805    |  0.7352  |
|               hf_T5               |  8   | 0.9678 |  0.9371   |      nan       |     nan     |  0.7266  |
|           timm_resnest            |  32  | 0.9868 |  0.8809   |      nan       |   0.8726    |  0.7218  |
|      timm_vision_transformer      |  8   | 1.0001 |  0.8868   |      nan       |   0.8871    |  0.7151  |
|             resnet50              |  32  | 1.0004 |  0.8678   |      nan       |   0.8041    |  0.7143  |
|            mnasnet1_0             |  32  | 0.9994 |  0.8793   |     0.173      |   0.8217    |  0.6596  |
|           squeezenet1_1           |  32  | 0.9604 |  0.7958   |     0.2951     |   0.7589    |  0.6595  |
|        mobilenet_v3_large         |  32  | 0.999  |  0.8661   |      nan       |    0.874    |  0.6573  |
|          resnext50_32x4d          |  8   |  1.0   |  0.8591   |      nan       |    0.823    |  0.6514  |
|                drq                |  1   | 0.9125 |  0.8399   |      nan       |   0.8395    |  0.6406  |
|         soft_actor_critic         | 256  | 0.964  |  0.9151   |     0.4737     |   0.9151    |  0.6279  |
|          LearningToPaint          |  96  | 0.9252 |  0.7196   |      nan       |    0.71     |  0.605   |
|            densenet121            |  4   |  1.0   |  0.8696   |      nan       |   0.8376    |  0.5739  |
|             resnet18              |  16  | 0.9782 |  0.7852   |      nan       |   0.7268    |  0.5644  |
|           lennard_jones           | 1000 |  1.0   |  1.0002   |     0.3735     |   1.0967    |  0.564   |
|      nvidia_deeprecommender       | 256  | 0.5596 |  0.5596   |     0.5262     |   0.5596    |  0.5596  |
|       functorch_dp_cifar10        |  64  | 0.9964 |  0.8131   |      nan       |    0.846    |  0.4465  |
|          pytorch_struct           | 200  |  1.0   |  0.5081   |     0.4858     |   0.5082    |  0.4235  |
|            hf_Reformer            |  4   | 0.3764 |    1.0    |     0.2539     |     nan     |  0.3629  |
|              yolov3               |  16  | 1.0054 |  0.8488   |      nan       |   0.8244    |   nan    |
|           hf_Longformer           |  2   | 0.9734 |   0.967   |     0.3374     |     nan     |   nan    |
|        speech_transformer         |  32  | 1.0015 |  0.9177   |      nan       |     nan     |   nan    |
|           hf_GPT2_large           |  4   | 0.9586 |  0.8649   |      nan       |     nan     |   nan    |
|               dlrm                | 2048 |  nan   |  0.7282   |      nan       |     nan     |   nan    |
|             tacotron2             |  64  | 0.9879 |  0.4069   |      nan       |     nan     |   nan    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|     MobileBertForQuestionAnswering      | 32 | 1.0156 |  0.8169   |      0.0       |     0.0     |  5.8307  |
|          MobileBertForMaskedLM          | 16 | 1.0187 |  0.8248   |      0.0       |     0.0     |  5.7089  |
|       MT5ForConditionalGeneration       | 2  | 1.0224 |  0.8508   |      0.0       |     0.0     |  5.4709  |
|           ElectraForCausalLM            | 1  | 1.0362 |  0.8465   |      0.0       |     0.0     |  5.4366  |
|            YituTechConvBert             | 1  | 1.0208 |  0.8384   |      0.0       |     0.0     |  4.6274  |
|         MegatronBertForCausalLM         | 2  | 1.0325 |  0.8502   |      0.0       |     0.0     |  4.1375  |
|     M2M100ForConditionalGeneration      | 2  | 1.0103 |  0.8308   |      0.0       |     0.0     |  4.0046  |
|           RobertaForCausalLM            | 4  | 1.0395 |  0.8381   |      0.0       |     0.0     |  3.9484  |
|             OPTForCausalLM              | 4  | 1.0159 |  0.8275   |      0.0       |     0.0     |  3.9047  |
|                CamemBert                | 1  | 1.0396 |  0.8447   |      0.0       |     0.0     |  3.4434  |
|     PegasusForConditionalGeneration     | 4  | 1.0105 |  0.8149   |      0.0       |     0.0     |  3.2421  |
|             XGLMForCausalLM             | 1  | 1.0117 |  0.8168   |      0.0       |     0.0     |  3.1117  |
|     PLBartForConditionalGeneration      | 8  | 1.0154 |  0.8245   |      0.0       |     0.0     |  2.8361  |
|    MegatronBertForQuestionAnswering     | 8  | 1.0376 |   0.859   |      0.0       |     0.0     |  2.688   |
|               DistillGPT2               | 1  | 1.024  |  0.8702   |      0.0       |     0.0     |  2.6104  |
|      MBartForConditionalGeneration      | 8  | 1.0136 |  0.8357   |      0.0       |     0.0     |  2.3857  |
|         Speech2Text2ForCausalLM         | 64 | 1.0051 |  0.8348   |      0.0       |     0.0     |  2.2561  |
|      GPT2ForSequenceClassification      | 4  | 0.9993 |  0.9755   |      0.0       |     0.0     |  2.1462  |
|       ElectraForQuestionAnswering       | 64 | 0.9999 |  0.9776   |      0.0       |     0.0     |  1.9724  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0098 |  0.8688   |      0.0       |     0.0     |  1.9514  |
|            TrOCRForCausalLM             | 8  | 1.0113 |   0.829   |      0.0       |     0.0     |  1.9288  |
|           PegasusForCausalLM            | 8  | 1.0103 |  0.8014   |      0.0       |     0.0     |  1.8377  |
|          DistilBertForMaskedLM          | 16 | 1.0299 |  0.8455   |      0.0       |     0.0     |  1.8339  |
|      BartForConditionalGeneration       | 1  | 1.0151 |  0.8364   |      0.0       |     0.0     |  1.7748  |
|     DistilBertForQuestionAnswering      | 32 | 1.034  |  0.8491   |      0.0       |     0.0     |  1.7693  |
|    LayoutLMForSequenceClassification    | 16 | 0.9972 |  0.9671   |      0.0       |     0.0     |  1.7319  |
|       T5ForConditionalGeneration        | 4  | 1.0002 |  0.9362   |      0.0       |     0.0     |  1.7017  |
|       AlbertForQuestionAnswering        | 2  | 1.0011 |   0.808   |      0.0       |     0.0     |  1.6617  |
|            AlbertForMaskedLM            | 2  | 1.0004 |   0.808   |      0.0       |     0.0     |  1.6509  |
|            PLBartForCausalLM            | 16 | 1.0101 |  0.9365   |      0.0       |     0.0     |  1.6438  |
|                 T5Small                 | 1  | 1.0281 |  0.8763   |      0.0       |     0.0     |  1.6266  |
|            XLNetLMHeadModel             | 4  | 1.0006 |  0.9605   |      0.0       |     0.0     |  1.5968  |
|           LayoutLMForMaskedLM           | 16 | 0.9985 |   0.969   |      0.0       |     0.0     |  1.5917  |
|             BartForCausalLM             | 2  | 1.0003 |  0.9618   |      0.0       |     0.0     |  1.4597  |
|       DebertaForQuestionAnswering       | 4  | 0.9344 |  0.7279   |     0.9349     |     0.0     |  1.4504  |
|        BertForQuestionAnswering         | 64 | 0.9972 |  0.9677   |      0.0       |     0.0     |  1.446   |
|       RobertaForQuestionAnswering       | 64 | 0.9979 |  0.9686   |      0.0       |     0.0     |  1.4407  |
|           DebertaForMaskedLM            | 4  | 0.9334 |  0.7268   |     0.7967     |     0.0     |  1.4123  |
|            MBartForCausalLM             | 16 | 1.0091 |   0.823   |      0.0       |     0.0     |  1.3982  |
|             BertForMaskedLM             | 64 | 0.9973 |   0.956   |      0.0       |     0.0     |  1.3317  |
|       BlenderbotSmallForCausalLM        | 64 | 1.0004 |   0.927   |      0.0       |     0.0     |  1.3061  |
|                 BigBird                 | 1  | 0.9924 |  0.9078   |      0.0       |     0.0     |  1.1488  |
|          AllenaiLongformerBase          | 1  | 0.9546 |  0.7324   |     0.854      |     0.0     |   0.0    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser |  inductor   |
+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+
|            AlbertForMaskedLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       AlbertForQuestionAnswering        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             BartForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             BertForMaskedLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|        BertForQuestionAnswering         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                 BigBird                 | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                CamemBert                | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           DebertaForMaskedLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|          DistilBertForMaskedLM          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     DistilBertForQuestionAnswering      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|               DistillGPT2               | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           ElectraForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       ElectraForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|      GPT2ForSequenceClassification      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           LayoutLMForMaskedLM           | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|    LayoutLMForSequenceClassification    | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            MBartForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       MT5ForConditionalGeneration       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|         MegatronBertForCausalLM         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|          MobileBertForMaskedLM          | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     MobileBertForQuestionAnswering      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|             OPTForCausalLM              | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            PLBartForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           PegasusForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|     PegasusForConditionalGeneration     | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|           RobertaForCausalLM            | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       RobertaForQuestionAnswering       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|         Speech2Text2ForCausalLM         | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       T5ForConditionalGeneration        | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|                 T5Small                 | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            TrOCRForCausalLM             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            XLNetLMHeadModel             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|            YituTechConvBert             | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |
|       DebertaForQuestionAnswering       | 1  |  pass  |   pass    | fail_accuracy  | fail_to_run |    pass     |
|          AllenaiLongformerBase          | 1  |  pass  |   pass    |      pass      | fail_to_run | fail_to_run |
|      BartForConditionalGeneration       | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|      MBartForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |
|     M2M100ForConditionalGeneration      | 1  |  pass  |   pass    |  fail_to_run   | fail_to_run |   0.0000    |
|             XGLMForCausalLM             | 0  | 0.0000 |  0.0000   |     0.0000     |   0.0000    |   0.0000    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|                  name                   | bs |  eager   | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|            XLNetLMHeadModel             | 4  | 17.5696  |  42.9354  |      nan       |     nan     | 327.9079 |
|          MobileBertForMaskedLM          | 16 | 135.8255 | 173.7536  |      nan       |     nan     | 308.991  |
|     MobileBertForQuestionAnswering      | 32 | 132.8536 | 171.7633  |      nan       |     nan     | 293.2592 |
|       T5ForConditionalGeneration        | 4  |  3.922   |  12.943   |      nan       |     nan     | 248.2661 |
|     M2M100ForConditionalGeneration      | 2  | 26.2972  |  44.8597  |      nan       |     nan     | 223.5051 |
|       MT5ForConditionalGeneration       | 2  |  6.6742  |  19.818   |      nan       |     nan     | 203.911  |
|            YituTechConvBert             | 1  |  9.2675  |  20.8203  |      nan       |     nan     | 195.7945 |
|      MBartForConditionalGeneration      | 8  | 26.6741  |  47.2119  |      nan       |     nan     | 173.1357 |
|             XGLMForCausalLM             | 1  | 15.5248  |  30.2576  |      nan       |     nan     | 170.6504 |
|     PegasusForConditionalGeneration     | 4  | 26.1067  |  45.261   |      nan       |     nan     | 167.2376 |
|           DebertaForMaskedLM            | 4  |  7.4344  |  14.5652  |    53.2994     |     nan     | 164.457  |
|      BartForConditionalGeneration       | 1  | 26.3695  |  45.964   |      nan       |     nan     | 155.1766 |
|         MegatronBertForCausalLM         | 2  | 16.6556  |  31.6041  |      nan       |     nan     | 151.6307 |
|    MegatronBertForQuestionAnswering     | 8  | 16.8169  |  31.9636  |      nan       |     nan     | 149.1209 |
|                 T5Small                 | 1  |  3.949   |  12.7818  |      nan       |     nan     | 148.9024 |
|     PLBartForConditionalGeneration      | 8  |  7.4476  |  17.4072  |      nan       |     nan     | 135.8023 |
| BlenderbotSmallForConditionalGeneration | 32 | 12.7961  |  25.3891  |      nan       |     nan     | 127.313  |
|       DebertaForQuestionAnswering       | 4  |  7.1998  |  14.5172  |    53.5221     |     nan     | 122.565  |
|           RobertaForCausalLM            | 4  |  5.2604  |  13.0452  |      nan       |     nan     | 104.3819 |
|    LayoutLMForSequenceClassification    | 16 |  5.4858  |  12.9984  |      nan       |     nan     | 93.8895  |
|           PegasusForCausalLM            | 8  |  9.9544  |  17.0666  |      nan       |     nan     | 92.9066  |
|       ElectraForQuestionAnswering       | 64 |  5.2746  |  12.8697  |      nan       |     nan     | 92.0522  |
|            MBartForCausalLM             | 16 |  10.398  |  17.1343  |      nan       |     nan     | 85.7616  |
|             OPTForCausalLM              | 4  |  4.9946  |  12.1428  |      nan       |     nan     | 84.3122  |
|             BertForMaskedLM             | 64 |  5.1354  |  12.5731  |      nan       |     nan     |  84.208  |
|           LayoutLMForMaskedLM           | 16 |  5.6794  |  13.827   |      nan       |     nan     | 82.2904  |
|             BartForCausalLM             | 2  | 10.0323  |  17.1562  |      nan       |     nan     | 81.4293  |
|      GPT2ForSequenceClassification      | 4  |  3.6793  |  10.0043  |      nan       |     nan     | 78.5398  |
|            TrOCRForCausalLM             | 8  |  9.9941  |  17.2084  |      nan       |     nan     | 73.7194  |
|       BlenderbotSmallForCausalLM        | 64 |  4.8936  |  9.6694   |      nan       |     nan     | 73.5185  |
|           ElectraForCausalLM            | 1  |  5.385   |  12.7792  |      nan       |     nan     | 70.1919  |
|                 BigBird                 | 1  | 11.6176  |  20.1569  |      nan       |     nan     | 67.9323  |
|     DistilBertForQuestionAnswering      | 32 |  1.921   |  5.3883   |      nan       |     nan     |  67.798  |
|         Speech2Text2ForCausalLM         | 64 |  3.2046  |  6.8548   |      nan       |     nan     | 67.6319  |
|            AlbertForMaskedLM            | 2  |  1.5751  |  8.7428   |      nan       |     nan     | 67.5946  |
|               DistillGPT2               | 1  |  1.5417  |  4.7305   |      nan       |     nan     | 66.8776  |
|            PLBartForCausalLM            | 16 |  3.3208  |  7.2321   |      nan       |     nan     | 65.4376  |
|                CamemBert                | 1  |  5.225   |  12.6121  |      nan       |     nan     | 64.2281  |
|       RobertaForQuestionAnswering       | 64 |  5.5016  |  12.5104  |      nan       |     nan     | 62.6216  |
|        BertForQuestionAnswering         | 64 |  5.228   |  12.4546  |      nan       |     nan     |  61.755  |
|          DistilBertForMaskedLM          | 16 |  1.9566  |  5.6066   |      nan       |     nan     | 51.4273  |
|       AlbertForQuestionAnswering        | 2  |  1.6953  |  8.6595   |      nan       |     nan     | 45.7502  |
|          AllenaiLongformerBase          | 1  | 12.2056  |  22.4955  |    93.5262     |     nan     |   nan    |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|      GPT2ForSequenceClassification      | 4  | 0.9675 |  0.9163   |      nan       |     nan     |   1.07   |
|            XLNetLMHeadModel             | 4  | 0.9912 |  0.8791   |      nan       |     nan     |  1.0109  |
|       ElectraForQuestionAnswering       | 64 | 1.0016 |  0.9539   |      nan       |     nan     |  1.0002  |
|                 T5Small                 | 1  |  1.0   |  0.9124   |      nan       |     nan     |  0.9876  |
|           LayoutLMForMaskedLM           | 16 | 0.9999 |  0.9238   |      nan       |     nan     |  0.9871  |
|             BertForMaskedLM             | 64 | 0.9996 |   0.899   |      nan       |     nan     |  0.9811  |
|    LayoutLMForSequenceClassification    | 16 | 1.004  |  0.9325   |      nan       |     nan     |  0.9712  |
| BlenderbotSmallForConditionalGeneration | 32 | 0.9998 |  0.8996   |      nan       |     nan     |  0.9557  |
|             BartForCausalLM             | 2  |  1.0   |  0.8769   |      nan       |     nan     |  0.9545  |
|       T5ForConditionalGeneration        | 4  | 0.9996 |  0.9594   |      nan       |     nan     |  0.9525  |
|         Speech2Text2ForCausalLM         | 64 | 0.9954 |  0.8489   |      nan       |     nan     |  0.9452  |
|            PLBartForCausalLM            | 16 | 1.0006 |  0.8667   |      nan       |     nan     |  0.9395  |
|       BlenderbotSmallForCausalLM        | 64 | 0.9996 |  0.8172   |      nan       |     nan     |  0.9269  |
|        BertForQuestionAnswering         | 64 | 0.9995 |  0.9315   |      nan       |     nan     |  0.9256  |
|       RobertaForQuestionAnswering       | 64 | 0.9996 |  0.9315   |      nan       |     nan     |  0.9254  |
|          DistilBertForMaskedLM          | 16 | 0.9991 |  0.8698   |      nan       |     nan     |  0.9167  |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8619   |      nan       |     nan     |  0.881   |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.6451   |      nan       |     nan     |  0.8636  |
|            MBartForCausalLM             | 16 |  1.0   |  0.8398   |      nan       |     nan     |  0.8565  |
|            AlbertForMaskedLM            | 2  |  1.0   |  0.6364   |      nan       |     nan     |  0.8515  |
|                 BigBird                 | 1  | 1.0024 |  0.9513   |      nan       |     nan     |  0.8349  |
|     DistilBertForQuestionAnswering      | 32 | 0.9987 |  0.8967   |      nan       |     nan     |  0.834   |
|     PLBartForConditionalGeneration      | 8  | 0.9999 |  0.8307   |      nan       |     nan     |  0.8252  |
|               DistillGPT2               | 1  | 1.0006 |  0.7548   |      nan       |     nan     |  0.812   |
|      MBartForConditionalGeneration      | 8  | 0.9999 |  0.8187   |      nan       |     nan     |  0.7699  |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.7955   |      nan       |     nan     |  0.7566  |
|                CamemBert                | 1  | 0.9989 |  0.7872   |      nan       |     nan     |  0.7482  |
|             OPTForCausalLM              | 4  | 0.9975 |  0.7501   |      nan       |     nan     |  0.7473  |
|            YituTechConvBert             | 1  | 0.9718 |  0.7819   |      nan       |     nan     |  0.7407  |
|           PegasusForCausalLM            | 8  | 0.999  |  0.9444   |      nan       |     nan     |  0.7324  |
|           RobertaForCausalLM            | 4  | 0.9237 |  0.7741   |      nan       |     nan     |  0.7309  |
|             XGLMForCausalLM             | 1  | 0.9999 |  0.9992   |      nan       |     nan     |  0.7214  |
|    MegatronBertForQuestionAnswering     | 8  | 0.9051 |  0.8218   |      nan       |     nan     |  0.7107  |
|          MobileBertForMaskedLM          | 16 | 0.9985 |  0.8983   |      nan       |     nan     |  0.6948  |
|     PegasusForConditionalGeneration     | 4  | 0.9996 |  0.9196   |      nan       |     nan     |  0.6769  |
|           ElectraForCausalLM            | 1  | 0.9993 |  0.8955   |      nan       |     nan     |  0.6701  |
|         MegatronBertForCausalLM         | 2  | 0.7726 |  0.7726   |      nan       |     nan     |  0.6697  |
|     M2M100ForConditionalGeneration      | 2  | 0.9999 |   0.954   |      nan       |     nan     |  0.6523  |
|     MobileBertForQuestionAnswering      | 32 | 1.0142 |  0.9796   |      nan       |     nan     |  0.6265  |
|       MT5ForConditionalGeneration       | 2  | 0.6019 |  0.6019   |      nan       |     nan     |  0.6019  |
|           DebertaForMaskedLM            | 4  | 0.9982 |  0.9826   |     0.3599     |     nan     |  0.4498  |
|       DebertaForQuestionAnswering       | 4  | 0.979  |  1.0568   |     0.3576     |     nan     |  0.3761  |
|          AllenaiLongformerBase          | 1  | 0.9996 |  0.9477   |     0.3752     |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|        res2net50_14w_8s         |  2  | 0.9966 |  0.8973   |      0.0       |   1.3904    |  5.5892  |
|            hrnet_w18            |  2  | 1.004  |  0.9636   |      0.0       |   1.3727    |  4.9403  |
|           res2next50            |  2  | 1.0004 |  0.9702   |      0.0       |    1.363    |  4.678   |
|        twins_pcpvt_base         | 32  | 1.0036 |  0.8938   |      0.0       |   1.3592    |  2.5448  |
|      xcit_large_24_p8_224       |  5  | 1.0012 |    0.0    |      0.0       |     0.0     |  2.0556  |
|          cait_m36_384           |  2  | 1.0024 |  0.8465   |      0.0       |   1.3421    |  2.0541  |
|        tnt_s_patch16_224        | 64  | 0.9997 |  0.9927   |      0.0       |   1.8446    |  2.0203  |
|          ghostnet_100           | 128 | 1.0043 |  0.9984   |      0.0       |   1.5386    |  1.8112  |
|          gmixer_24_224          | 64  | 1.0008 |  0.8843   |      0.0       |   1.0368    |  1.6807  |
|           volo_d1_224           | 64  | 0.9994 |  0.9943   |      0.0       |   1.1498    |  1.6678  |
|         crossvit_9_240          | 64  | 1.0032 |  0.9572   |      0.0       |   1.1315    |  1.5867  |
|            nfnet_l0             | 64  | 1.0066 |  0.8388   |      0.0       |   1.1434    |  1.5833  |
|  swin_base_patch4_window7_224   | 64  | 0.9993 |   0.961   |      0.0       |   1.0563    |  1.5723  |
|            lcnet_050            | 128 | 0.9684 |  0.9499   |      0.0       |   1.5746    |  1.5519  |
|         coat_lite_mini          | 128 | 1.0002 |  0.9947   |      0.0       |   1.2658    |  1.5316  |
|           regnety_002           | 128 | 0.9786 |  0.9364   |      0.0       |   1.3847    |  1.5049  |
|           resnest101e           | 32  | 1.0032 |  0.9843   |      0.0       |   1.4186    |  1.4787  |
|          resmlp_12_224          | 128 |  1.0   |  0.9975   |     0.7819     |     0.0     |  1.4644  |
|          jx_nest_base           | 32  | 0.9992 |  0.9909   |      0.0       |    1.238    |  1.4634  |
|           convit_base           | 32  | 0.9995 |  0.9916   |      0.0       |     0.0     |  1.3992  |
|          gmlp_s16_224           | 64  | 0.9989 |  0.9827   |      0.0       |    1.051    |  1.3904  |
|            pit_b_224            | 64  | 0.9997 |  0.9939   |      0.0       |   1.0687    |  1.3644  |
|           dm_nfnet_f0           | 128 | 0.9993 |  0.9976   |      0.0       |   1.1757    |  1.326   |
|          mixer_b16_224          | 64  | 0.9992 |  0.9907   |     0.7171     |   0.9682    |  1.3168  |
| deit_base_distilled_patch16_224 | 64  | 0.9994 |  0.9911   |      0.0       |    1.071    |  1.2892  |
|      beit_base_patch16_224      | 64  | 0.9997 |  0.9783   |      0.0       |   1.0509    |  1.2862  |
|        adv_inception_v3         | 128 | 0.9998 |  0.9953   |      0.0       |   1.1938    |  1.2801  |
|       gluon_inception_v3        | 128 |  1.0   |  0.9948   |      0.0       |   1.1944    |  1.2254  |
|         poolformer_m36          | 64  | 0.999  |  0.9974   |      0.0       |     0.0     |  1.209   |
|          inception_v3           | 128 | 0.9999 |   0.995   |      0.0       |   1.1944    |  1.2078  |
|           mobilevit_s           | 32  | 0.9736 |  0.7981   |      0.0       |   1.2122    |  1.2009  |
|      vit_base_patch16_224       | 64  | 0.9995 |  0.9934   |      0.0       |   1.0006    |  1.1978  |
|            mixnet_l             | 64  | 0.9791 |  0.8892   |      0.0       |   1.0867    |  1.178   |
|           tf_mixnet_l           | 64  | 0.9808 |   0.894   |      0.0       |   1.1188    |  1.1177  |
|         visformer_small         | 128 | 0.9999 |  1.0005   |      0.0       |   1.0857    |  1.0997  |
|          pnasnet5large          | 16  | 1.0052 |  1.0336   |      0.0       |   1.1349    |  1.052   |
|            fbnetv3_b            | 128 | 0.9596 |  0.9445   |      0.0       |   1.2915    |  1.0325  |
|             dla102              | 64  | 1.0033 |  0.9902   |      0.0       |   1.3766    |  1.0242  |
|             dpn107              | 32  | 0.9389 |  0.9299   |      0.0       |   0.9938    |  0.9342  |
|            repvgg_a2            | 128 | 0.9422 |  0.9332   |     0.6563     |   1.1301    |  0.9011  |
|           fbnetc_100            | 128 | 0.952  |  0.9423   |     0.6644     |   1.3738    |  0.8982  |
|           selecsls42b           | 128 | 0.9998 |  0.9936   |      0.0       |   1.3554    |  0.8981  |
|          cspdarknet53           | 64  | 0.9431 |  0.9323   |      0.0       |   0.9006    |  0.8892  |
|        convmixer_768_32         | 32  | 0.9998 |  0.9979   |      0.0       |   1.0527    |  0.8866  |
|            tinynet_a            | 128 | 0.9575 |  0.8062   |      0.0       |   1.0907    |  0.8775  |
|           mnasnet_100           | 128 | 0.9523 |  0.9433   |     0.6613     |   1.3688    |  0.8396  |
|          convnext_base          | 32  | 1.0041 |  0.9226   |      0.0       |   1.3138    |  0.8371  |
|      mobilenetv3_large_100      | 128 | 0.9548 |  0.9437   |      0.0       |   1.3436    |  0.8349  |
|        res2net101_26w_4s        | 64  |  1.0   |  0.9969   |      0.0       |   1.3864    |  0.8124  |
|            gernet_l             | 128 | 0.9461 |  0.9361   |      0.0       |   1.1391    |  0.8112  |
|          spnasnet_100           | 128 | 0.9468 |  0.9375   |     0.6531     |   1.3174    |  0.7948  |
|         mobilenetv2_100         | 128 | 0.9504 |  0.9396   |      0.0       |   0.8657    |  0.7434  |
|        sebotnet33ts_256         | 64  | 0.9669 |  0.8365   |      0.0       |   1.1144    |  0.734   |
|       tf_efficientnet_b0        | 128 | 0.9647 |  0.8063   |      0.0       |   1.0946    |  0.7245  |
|          botnet26t_256          | 128 | 0.9792 |  0.9756   |      0.0       |   1.3411    |  0.7229  |
|        eca_halonext26ts         | 64  | 0.9636 |  0.8061   |      0.0       |   1.0992    |  0.705   |
|       eca_botnext26ts_256       | 64  | 0.9616 |  0.8005   |      0.0       |   1.1086    |  0.6749  |
|           rexnet_100            | 128 | 0.9646 |  0.8483   |      0.0       |   1.0366    |  0.6448  |
|        ese_vovnet19b_dw         | 128 | 0.9691 |  0.9642   |      0.0       |   1.2435    |  0.6419  |
|     swsl_resnext101_32x16d      | 32  | 0.9989 |  0.9801   |      0.0       |   1.0755    |  0.6057  |
|        gluon_xception65         | 32  | 0.9985 |  0.9876   |      0.0       |   1.0635    |  0.5872  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|        adv_inception_v3         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          botnet26t_256          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        convmixer_768_32         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          cspdarknet53           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dla102              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dpn107              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        eca_halonext26ts         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            gernet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          ghostnet_100           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       gluon_inception_v3        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          inception_v3           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            lcnet_050            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            mixnet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         mobilenetv2_100         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           mobilevit_s           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            nfnet_l0             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          pnasnet5large          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           regnety_002           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net101_26w_4s        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net50_14w_8s         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           res2next50            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           rexnet_100            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           selecsls42b           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           tf_mixnet_l           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            tinynet_a            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         visformer_small         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      vit_base_patch16_224       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |
|      xcit_large_24_p8_224       | 2  | pass  |  fail_to_run  |  fail_to_run   |  fail_to_run  |     pass      |
|          gmixer_24_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|          gmlp_s16_224           | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|         poolformer_m36          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|           resnest101e           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|         coat_lite_mini          | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            pit_b_224            | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|        gluon_xception65         | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|            hrnet_w18            | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |
|            fbnetv3_b            | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy | fail_accuracy |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy | fail_accuracy |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|              name               | bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|            hrnet_w18            |  2  | 99.5471 | 142.1533  |      nan       |  471.0836   | 1399.7866 |
|             dpn107              | 32  | 13.8554 |  28.7483  |      nan       |  112.6905   | 1352.9515 |
|          pnasnet5large          | 16  | 60.4879 |  88.5364  |      nan       |  251.1834   | 1340.2779 |
|           rexnet_100            | 128 | 6.8038  |  14.4292  |      nan       |  120.8599   | 1069.0229 |
|        res2net50_14w_8s         |  2  | 20.6816 |  38.8763  |      nan       |  121.3153   | 987.9243  |
|          ghostnet_100           | 128 | 9.5437  |  19.3356  |      nan       |   96.7996   |  882.697  |
|           mobilevit_s           | 32  | 5.9457  |  13.7479  |      nan       |   61.5465   | 879.8939  |
|        twins_pcpvt_base         | 32  | 26.7453 |  43.9793  |      nan       |   95.4658   | 843.4319  |
|       eca_botnext26ts_256       | 64  | 2.6274  |   7.452   |      nan       |   63.6443   | 839.8298  |
|            mixnet_l             | 64  | 13.5416 |  23.0219  |      nan       |   88.4897   |  835.172  |
|            fbnetv3_b            | 128 | 13.3888 |  24.1748  |      nan       |  109.7611   | 772.9081  |
|            tinynet_a            | 128 | 7.7469  |  15.6905  |      nan       |   83.963    | 743.7648  |
|           resnest101e           | 32  | 27.3378 |  47.7018  |      nan       |  125.8207   | 700.9297  |
|        sebotnet33ts_256         | 64  |  3.961  |  10.0038  |      nan       |   69.1966   | 648.9216  |
|           fbnetc_100            | 128 | 5.7278  |  12.479   |    85.5777     |   63.2472   | 638.7629  |
|         coat_lite_mini          | 128 |  3.266  |  9.1575   |      nan       |   34.2188   | 636.5705  |
|          botnet26t_256          | 128 | 2.4678  |  6.7299   |      nan       |   51.027    | 588.0481  |
|           tf_mixnet_l           | 64  | 13.7087 |  23.521   |      nan       |   89.4409   | 565.3166  |
|             dla102              | 64  | 10.8136 |  22.7803  |      nan       |   96.3515   | 540.0435  |
|        eca_halonext26ts         | 64  | 2.7442  |  7.8035   |      nan       |   67.5506   | 524.7978  |
|          cspdarknet53           | 64  | 6.1577  |  13.819   |      nan       |   44.4818   | 516.8901  |
|           res2next50            |  2  | 7.5752  |  17.4257  |      nan       |   64.6808   | 508.6052  |
|           mnasnet_100           | 128 | 4.2555  |  9.7943   |    61.6838     |   53.8161   | 460.7911  |
|       tf_efficientnet_b0        | 128 | 5.9847  |  12.8606  |      nan       |   81.5061   | 453.8244  |
|          convnext_base          | 32  | 11.9958 |  19.1913  |      nan       |   46.8047   | 447.0484  |
|        res2net101_26w_4s        | 64  | 26.2277 |  46.854   |      nan       |  142.2874   | 442.9257  |
|  swin_base_patch4_window7_224   | 64  | 12.9152 |  26.0704  |      nan       |   83.2757   | 431.9341  |
|        adv_inception_v3         | 128 | 8.7126  |  19.1453  |      nan       |   105.887   | 431.2125  |
|            nfnet_l0             | 64  | 6.0435  |  13.1945  |      nan       |   38.7061   | 399.3298  |
|      mobilenetv3_large_100      | 128 | 4.5585  |  10.0823  |      nan       |   83.9393   | 397.9304  |
|         mobilenetv2_100         | 128 | 4.1958  |  9.4317   |      nan       |   43.0148   | 392.8332  |
|           regnety_002           | 128 | 4.9475  |  10.8999  |      nan       |   60.0576   | 392.0666  |
|        ese_vovnet19b_dw         | 128 | 2.0528  |  5.1975   |      nan       |   39.6613   | 388.7989  |
|         visformer_small         | 128 | 2.5656  |  6.5413   |      nan       |   31.5689   | 381.6193  |
|      xcit_large_24_p8_224       |  5  | 37.375  |    nan    |      nan       |     nan     | 352.6873  |
|        gluon_xception65         | 32  | 15.5262 |  29.4819  |      nan       |    78.65    | 347.3151  |
|          jx_nest_base           | 32  | 9.7428  |  20.6265  |      nan       |    58.19    | 320.1488  |
|          cait_m36_384           |  2  | 47.7696 |  73.0288  |      nan       |  109.6265   | 306.2536  |
|         poolformer_m36          | 64  | 13.3601 |  21.2852  |      nan       |     nan     | 303.6845  |
|            gernet_l             | 128 | 4.9595  |  11.1233  |      nan       |   47.9657   | 292.4524  |
|         crossvit_9_240          | 64  |  7.76   |  17.4312  |      nan       |   42.3485   | 281.7218  |
|           selecsls42b           | 128 | 2.5011  |  6.9082   |      nan       |   51.4771   | 280.0613  |
|       gluon_inception_v3        | 128 | 8.5703  |  18.794   |      nan       |  105.7343   | 276.2485  |
|          spnasnet_100           | 128 | 5.5887  |  12.1725  |    81.9197     |   60.788    | 274.2042  |
|            lcnet_050            | 128 | 2.0128  |   5.131   |      nan       |   39.9489   | 244.1516  |
|          inception_v3           | 128 | 8.5271  |  18.8796  |      nan       |  106.0345   | 223.4377  |
|     swsl_resnext101_32x16d      | 32  | 10.3836 |  22.0073  |      nan       |   61.957    | 221.0566  |
|           volo_d1_224           | 64  | 6.7957  |  15.1805  |      nan       |   44.0256   | 211.2936  |
|           convit_base           | 32  | 4.3998  |  10.9888  |      nan       |     nan     | 182.1969  |
|            pit_b_224            | 64  |  3.947  |  10.0387  |      nan       |   27.9232   | 182.0427  |
|        tnt_s_patch16_224        | 64  | 12.8349 |  25.0047  |      nan       |   48.7462   | 166.9154  |
|          gmlp_s16_224           | 64  | 9.5498  |  17.7147  |      nan       |   30.2121   | 151.1543  |
|          gmixer_24_224          | 64  | 8.8075  |  17.6468  |      nan       |   35.1994   | 141.2526  |
|            repvgg_a2            | 128 | 4.8536  |  10.7212  |    53.3933     |   65.0058   | 138.1295  |
|           dm_nfnet_f0           | 128 | 6.6351  |  13.4685  |      nan       |   42.0166   | 133.3581  |
|          resmlp_12_224          | 128 | 2.9325  |  6.0916   |    10.2012     |     nan     | 100.1161  |
|          mixer_b16_224          | 64  | 2.9455  |  7.0706   |     17.11      |   18.0814   |  96.6917  |
|      beit_base_patch16_224      | 64  | 4.9563  |  10.3961  |      nan       |   21.2115   |  91.1698  |
|        convmixer_768_32         | 32  | 7.1212  |  14.8516  |      nan       |   23.9174   |  89.8166  |
| deit_base_distilled_patch16_224 | 64  | 3.2165  |  8.1922   |      nan       |   16.7035   |  84.6066  |
|      vit_base_patch16_224       | 64  | 3.0903  |  7.8695   |      nan       |   16.0406   |  71.5226  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|          gmixer_24_224          | 64  | 1.0001 |  0.9563   |      nan       |   0.8998    |  1.2577  |
|          gmlp_s16_224           | 64  |  1.0   |  0.9679   |      nan       |    0.92     |  1.2405  |
|            tinynet_a            | 128 | 1.0001 |  0.7955   |      nan       |   0.7958    |  1.1632  |
|          pnasnet5large          | 16  | 1.0583 |  0.9923   |      nan       |   1.1741    |  1.1265  |
|        eca_halonext26ts         | 64  | 0.999  |  0.7814   |      nan       |    0.786    |  1.0887  |
|           dm_nfnet_f0           | 128 | 0.9758 |  0.9039   |      nan       |    0.95     |  1.0616  |
|        tnt_s_patch16_224        | 64  |  1.0   |  0.9718   |      nan       |   0.9431    |  1.0587  |
|           volo_d1_224           | 64  | 1.0015 |  0.9518   |      nan       |   0.8587    |  1.0378  |
|           convit_base           | 32  | 0.9991 |   0.86    |      nan       |     nan     |  1.0309  |
|      beit_base_patch16_224      | 64  | 0.9999 |  0.9367   |      nan       |   0.9298    |  1.0097  |
|           mobilevit_s           | 32  |  1.0   |  0.7722   |      nan       |    0.787    |  1.0078  |
|           rexnet_100            | 128 | 0.9988 |  0.7919   |      nan       |   0.8648    |  1.0009  |
|             dla102              | 64  | 0.9998 |  0.9549   |      nan       |   0.9751    |  0.997   |
|            pit_b_224            | 64  | 1.0021 |  0.8074   |      nan       |   0.8179    |  0.9856  |
|         poolformer_m36          | 64  | 1.0015 |  0.9462   |      nan       |     nan     |  0.9797  |
|          convnext_base          | 32  | 1.0065 |   0.908   |      nan       |   0.7521    |  0.9564  |
|        twins_pcpvt_base         | 32  | 0.9963 |  0.9079   |      nan       |   0.8007    |  0.9553  |
|        convmixer_768_32         | 32  | 0.9992 |  0.9807   |      nan       |   0.9715    |  0.9508  |
|         visformer_small         | 128 | 0.9899 |  0.9353   |      nan       |   0.8884    |  0.9342  |
|           resnest101e           | 32  | 1.0002 |  0.9762   |      nan       |   0.9535    |  0.9292  |
|           tf_mixnet_l           | 64  | 0.9995 |  0.8624   |      nan       |   0.8426    |  0.9291  |
|          mixer_b16_224          | 64  | 0.9929 |  0.9425   |     0.2532     |   0.7726    |  0.9225  |
|       tf_efficientnet_b0        | 128 | 1.0006 |  0.7769   |      nan       |    0.846    |  0.9189  |
|            nfnet_l0             | 64  | 0.9993 |   0.824   |      nan       |   0.8257    |  0.913   |
|         mobilenetv2_100         | 128 | 0.9992 |  0.7716   |      nan       |   0.9249    |  0.8963  |
|      vit_base_patch16_224       | 64  | 0.9955 |  0.9384   |      nan       |   0.8801    |  0.8916  |
| deit_base_distilled_patch16_224 | 64  | 0.9944 |  0.9376   |      nan       |   0.8794    |  0.8911  |
|      mobilenetv3_large_100      | 128 | 0.9987 |  0.8562   |      nan       |   0.8673    |  0.8886  |
|        adv_inception_v3         | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|       gluon_inception_v3        | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|          inception_v3           | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|        gluon_xception65         | 32  |  1.0   |  0.8895   |      nan       |   0.8854    |  0.8713  |
|             dpn107              | 32  | 0.9981 |  0.9115   |      nan       |   0.8834    |  0.8701  |
|           selecsls42b           | 128 | 0.9789 |  0.8913   |      nan       |   0.8811    |  0.8659  |
|            fbnetv3_b            | 128 | 1.0003 |  0.7918   |      nan       |   0.7903    |  0.8645  |
|            mixnet_l             | 64  | 0.9989 |  0.8507   |      nan       |   0.7796    |  0.8601  |
|          spnasnet_100           | 128 | 0.9988 |  0.8961   |     0.1651     |   0.8371    |  0.8599  |
|       eca_botnext26ts_256       | 64  | 0.9998 |  0.7776   |      nan       |   0.7813    |  0.8532  |
|     swsl_resnext101_32x16d      | 32  | 1.0009 |  0.8805   |      nan       |   0.8487    |  0.8523  |
|      xcit_large_24_p8_224       |  5  | 0.9987 |    nan    |      nan       |     nan     |  0.8489  |
|          resmlp_12_224          | 128 | 0.9827 |  0.9667   |     0.2637     |     nan     |  0.845   |
|          ghostnet_100           | 128 | 1.0013 |  0.8903   |      nan       |   0.9244    |  0.833   |
|         coat_lite_mini          | 128 | 1.0338 |   0.929   |      nan       |   0.6593    |  0.8328  |
|        ese_vovnet19b_dw         | 128 |  1.0   |   0.867   |      nan       |   0.9146    |  0.8269  |
|          cspdarknet53           | 64  |  1.0   |  0.8469   |      nan       |   0.7906    |  0.813   |
|          cait_m36_384           |  2  | 0.9998 |  0.8806   |      nan       |   0.9023    |  0.8081  |
|          jx_nest_base           | 32  |  1.0   |  0.8945   |      nan       |    0.86     |   0.8    |
|         crossvit_9_240          | 64  | 1.0008 |  0.8801   |      nan       |   0.8854    |  0.7933  |
|        res2net101_26w_4s        | 64  | 0.9999 |  0.9202   |      nan       |   0.8569    |  0.7834  |
|           mnasnet_100           | 128 | 0.9993 |  0.8882   |     0.1669     |   0.8253    |  0.773   |
|  swin_base_patch4_window7_224   | 64  | 0.9998 |  0.9234   |      nan       |   0.8451    |  0.7676  |
|        sebotnet33ts_256         | 64  | 0.9999 |  0.7108   |      nan       |   0.7354    |  0.7449  |
|            gernet_l             | 128 | 0.9998 |  0.8655   |      nan       |   0.8299    |  0.7238  |
|           fbnetc_100            | 128 | 0.9984 |  0.8631   |     0.1626     |   0.7352    |  0.7104  |
|            lcnet_050            | 128 | 0.9992 |  0.7927   |      nan       |   0.7885    |  0.705   |
|           regnety_002           | 128 | 0.9994 |  0.8284   |      nan       |   0.7819    |  0.6975  |
|          botnet26t_256          | 128 |  1.0   |  0.8755   |      nan       |    0.78     |  0.6616  |
|           res2next50            |  2  |  1.0   |  0.8301   |      nan       |   0.8198    |  0.6012  |
|        res2net50_14w_8s         |  2  |  1.0   |  0.8275   |      nan       |   0.8169    |  0.5927  |
|            hrnet_w18            |  2  |  1.0   |  0.8383   |      nan       |   0.8363    |  0.5746  |
|            repvgg_a2            | 128 | 1.0003 |  0.7971   |     0.1444     |   0.6902    |  0.5572  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/torchbench_amp.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+-----------+-------------+
| Compiler  | huggingface |
+-----------+-------------+
| aot_eager | 98%, 42/43  |
| inductor  | 84%, 36/43  |
+-----------+-------------+

Geometric mean speedup

+-----------+-------------+
| Compiler  | huggingface |
+-----------+-------------+
| aot_eager |    1.00x    |
| inductor  |    2.25x    |
+-----------+-------------+

Mean compilation time (seconds)

+-----------+-------------+
| Compiler  | huggingface |
+-----------+-------------+
| aot_eager |    25.89    |
| inductor  |    87.60    |
+-----------+-------------+

Peak memory footprint compression ratio (higher is better)

+-----------+-------------+
| Compiler  | huggingface |
+-----------+-------------+
| aot_eager |    0.86x    |
| inductor  |    0.83x    |
+-----------+-------------+

Metrics over time

see more

bench_logs/passrate_over_time.png :

bench_logs/geomean_over_time.png :

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+-----------+----------+
|                  name                   | bs | aot_eager | inductor |
+-----------------------------------------+----+-----------+----------+
|           ElectraForCausalLM            | 1  |  0.8457   |  6.3238  |
|          MobileBertForMaskedLM          | 16 |  0.8464   |  6.2119  |
|       MT5ForConditionalGeneration       | 2  |  0.8564   |  5.4336  |
|     MobileBertForQuestionAnswering      | 32 |  0.8237   |  5.0662  |
|            YituTechConvBert             | 1  |  0.8386   |  4.728   |
|         MegatronBertForCausalLM         | 2  |  0.8526   |  4.216   |
|             OPTForCausalLM              | 4  |  0.8238   |  3.9855  |
|           RobertaForCausalLM            | 4  |  0.8428   |  3.9209  |
|     PegasusForConditionalGeneration     | 4  |  0.8341   |  3.8129  |
|     M2M100ForConditionalGeneration      | 2  |   0.815   |  3.7545  |
|             XGLMForCausalLM             | 1  |  0.8097   |  3.6427  |
|                CamemBert                | 1  |  0.8502   |  3.4717  |
|     PLBartForConditionalGeneration      | 8  |  0.8242   |  3.2557  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8588   |  3.1214  |
|               DistillGPT2               | 1  |  0.8651   |  2.624   |
|      MBartForConditionalGeneration      | 8  |  0.8457   |  2.4031  |
|      GPT2ForSequenceClassification      | 4  |  0.9749   |  2.1444  |
|         Speech2Text2ForCausalLM         | 64 |  0.8355   |  2.0987  |
|       ElectraForQuestionAnswering       | 64 |  0.9657   |  1.9725  |
|            TrOCRForCausalLM             | 8  |  0.8338   |  1.9162  |
|                 T5Small                 | 1  |  0.8839   |  1.8796  |
|          DistilBertForMaskedLM          | 16 |  0.8498   |  1.8745  |
|           PegasusForCausalLM            | 8  |  0.8044   |  1.8467  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.8895   |  1.7928  |
|      BartForConditionalGeneration       | 1  |  0.8342   |  1.7886  |
|     DistilBertForQuestionAnswering      | 32 |  0.8476   |  1.7631  |
|    LayoutLMForSequenceClassification    | 16 |  0.9786   |  1.7266  |
|       T5ForConditionalGeneration        | 4  |  0.9354   |  1.6896  |
|       AlbertForQuestionAnswering        | 2  |  0.8084   |  1.6586  |
|            AlbertForMaskedLM            | 2  |  0.8084   |  1.6458  |
|            XLNetLMHeadModel             | 4  |  0.9599   |  1.5933  |
|            PLBartForCausalLM            | 16 |  0.9311   |  1.5454  |
|       DebertaForQuestionAnswering       | 4  |  0.7242   |  1.4752  |
|             BartForCausalLM             | 2  |   0.963   |  1.4663  |
|       RobertaForQuestionAnswering       | 64 |  0.9577   |  1.4464  |
|        BertForQuestionAnswering         | 64 |  0.9665   |  1.4415  |
|            MBartForCausalLM             | 16 |  0.8988   |  1.4018  |
|           DebertaForMaskedLM            | 4  |   0.729   |  1.3922  |
|             BertForMaskedLM             | 64 |  0.9562   |  1.3337  |
|       BlenderbotSmallForCausalLM        | 64 |  0.9249   |  1.303   |
|                 BigBird                 | 1  |  0.9124   |  1.1505  |
|           LayoutLMForMaskedLM           | 16 |  0.9693   |   0.0    |
|          AllenaiLongformerBase          | 1  |  0.7271   |   0.0    |
+-----------------------------------------+----+-----------+----------+

Accuracy

+-----------------------------------------+----+-----------+-------------+
|                  name                   | bs | aot_eager |  inductor   |
+-----------------------------------------+----+-----------+-------------+
|            AlbertForMaskedLM            | 1  |   pass    |    pass     |
|       AlbertForQuestionAnswering        | 1  |   pass    |    pass     |
|             BartForCausalLM             | 1  |   pass    |    pass     |
|             BertForMaskedLM             | 1  |   pass    |    pass     |
|        BertForQuestionAnswering         | 1  |   pass    |    pass     |
|                 BigBird                 | 1  |   pass    |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |   pass    |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |   pass    |    pass     |
|                CamemBert                | 1  |   pass    |    pass     |
|           DebertaForMaskedLM            | 1  |   pass    |    pass     |
|       DebertaForQuestionAnswering       | 1  |   pass    |    pass     |
|          DistilBertForMaskedLM          | 1  |   pass    |    pass     |
|     DistilBertForQuestionAnswering      | 1  |   pass    |    pass     |
|               DistillGPT2               | 1  |   pass    |    pass     |
|           ElectraForCausalLM            | 1  |   pass    |    pass     |
|       ElectraForQuestionAnswering       | 1  |   pass    |    pass     |
|      GPT2ForSequenceClassification      | 1  |   pass    |    pass     |
|           LayoutLMForMaskedLM           | 1  |   pass    |    pass     |
|    LayoutLMForSequenceClassification    | 1  |   pass    |    pass     |
|            MBartForCausalLM             | 1  |   pass    |    pass     |
|       MT5ForConditionalGeneration       | 1  |   pass    |    pass     |
|         MegatronBertForCausalLM         | 1  |   pass    |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |   pass    |    pass     |
|          MobileBertForMaskedLM          | 1  |   pass    |    pass     |
|     MobileBertForQuestionAnswering      | 1  |   pass    |    pass     |
|             OPTForCausalLM              | 1  |   pass    |    pass     |
|            PLBartForCausalLM            | 1  |   pass    |    pass     |
|           PegasusForCausalLM            | 1  |   pass    |    pass     |
|     PegasusForConditionalGeneration     | 1  |   pass    |    pass     |
|           RobertaForCausalLM            | 1  |   pass    |    pass     |
|       RobertaForQuestionAnswering       | 1  |   pass    |    pass     |
|         Speech2Text2ForCausalLM         | 1  |   pass    |    pass     |
|       T5ForConditionalGeneration        | 1  |   pass    |    pass     |
|                 T5Small                 | 1  |   pass    |    pass     |
|            TrOCRForCausalLM             | 1  |   pass    |    pass     |
|            XLNetLMHeadModel             | 1  |   pass    |    pass     |
|            YituTechConvBert             | 1  |   pass    |    pass     |
|          AllenaiLongformerBase          | 1  |   pass    | fail_to_run |
|      BartForConditionalGeneration       | 1  |   pass    | fail_to_run |
|      MBartForConditionalGeneration      | 1  |   pass    | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |   pass    | fail_to_run |
|     M2M100ForConditionalGeneration      | 1  |   pass    |   0.0000    |
|             XGLMForCausalLM             | 0  |  0.0000   |   0.0000    |
+-----------------------------------------+----+-----------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+----------+
|                  name                   | bs | aot_eager | inductor |
+-----------------------------------------+----+-----------+----------+
|          MobileBertForMaskedLM          | 16 | 177.6258  | 256.3088 |
|     MobileBertForQuestionAnswering      | 32 |  173.307  | 244.0373 |
|     M2M100ForConditionalGeneration      | 2  |  46.0615  | 178.9848 |
|       T5ForConditionalGeneration        | 4  |  12.8008  | 154.8311 |
|      MBartForConditionalGeneration      | 8  |  47.4506  | 151.3125 |
|             XGLMForCausalLM             | 1  |  30.2354  | 144.8421 |
|      BartForConditionalGeneration       | 1  |  46.2761  | 140.5452 |
|       MT5ForConditionalGeneration       | 2  |  19.9685  | 140.3639 |
|            XLNetLMHeadModel             | 4  |  41.7665  | 137.2853 |
|     PegasusForConditionalGeneration     | 4  |  46.8928  | 136.7294 |
|       DebertaForQuestionAnswering       | 4  |  14.3774  | 130.8288 |
|           DebertaForMaskedLM            | 4  |  14.5131  | 120.3766 |
|         MegatronBertForCausalLM         | 2  |  31.7314  | 118.0195 |
|    MegatronBertForQuestionAnswering     | 8  |  31.7457  | 117.8187 |
|            YituTechConvBert             | 1  |  20.5728  | 101.3281 |
| BlenderbotSmallForConditionalGeneration | 32 |  25.0994  | 96.1492  |
|     PLBartForConditionalGeneration      | 8  |  17.4257  |  94.093  |
|                 T5Small                 | 1  |  12.6393  | 91.9498  |
|             OPTForCausalLM              | 4  |  12.2074  | 71.8761  |
|            MBartForCausalLM             | 16 |  17.439   | 68.2201  |
|       ElectraForQuestionAnswering       | 64 |  12.8934  | 67.2964  |
|            TrOCRForCausalLM             | 8  |  17.0497  | 65.3169  |
|    LayoutLMForSequenceClassification    | 16 |  12.9564  | 65.1504  |
|           ElectraForCausalLM            | 1  |  12.7619  | 64.8722  |
|             BartForCausalLM             | 2  |  16.9692  | 64.1101  |
|       RobertaForQuestionAnswering       | 64 |  13.3347  | 63.5298  |
|           PegasusForCausalLM            | 8  |  17.0779  | 62.4789  |
|             BertForMaskedLM             | 64 |  12.6206  | 61.9757  |
|                 BigBird                 | 1  |  20.3027  | 61.6895  |
|        BertForQuestionAnswering         | 64 |  12.604   |  61.571  |
|      GPT2ForSequenceClassification      | 4  |  10.1868  | 60.4961  |
|                CamemBert                | 1  |  12.6605  |  59.846  |
|           RobertaForCausalLM            | 4  |  12.8908  | 58.2581  |
|       BlenderbotSmallForCausalLM        | 64 |  9.5692   | 52.4237  |
|            PLBartForCausalLM            | 16 |  6.8495   | 48.1822  |
|            AlbertForMaskedLM            | 2  |  9.2523   | 45.7043  |
|       AlbertForQuestionAnswering        | 2  |  8.8283   | 45.5348  |
|               DistillGPT2               | 1  |  4.7016   | 41.7968  |
|          DistilBertForMaskedLM          | 16 |  5.5164   | 39.5652  |
|     DistilBertForQuestionAnswering      | 32 |  5.4656   | 39.5651  |
|         Speech2Text2ForCausalLM         | 64 |  6.9795   |  38.258  |
|          AllenaiLongformerBase          | 1  |  22.9042  |   nan    |
|           LayoutLMForMaskedLM           | 16 |   13.22   |   nan    |
+-----------------------------------------+----+-----------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+-----------+----------+
|                  name                   | bs | aot_eager | inductor |
+-----------------------------------------+----+-----------+----------+
|      GPT2ForSequenceClassification      | 4  |  0.9163   |   1.07   |
|       ElectraForQuestionAnswering       | 64 |  0.9539   |  1.0237  |
|            XLNetLMHeadModel             | 4  |  0.8791   |  1.0109  |
|                 T5Small                 | 1  |  0.9124   |  0.9876  |
|             BertForMaskedLM             | 64 |   0.899   |  0.9811  |
|    LayoutLMForSequenceClassification    | 16 |  0.9325   |  0.9712  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.8996   |  0.9557  |
|             BartForCausalLM             | 2  |  0.8769   |  0.9545  |
|       T5ForConditionalGeneration        | 4  |  0.9594   |  0.9525  |
|         Speech2Text2ForCausalLM         | 64 |  0.8489   |  0.9452  |
|          DistilBertForMaskedLM          | 16 |  0.8698   |  0.9448  |
|           ElectraForCausalLM            | 1  |  0.8955   |  0.941   |
|            PLBartForCausalLM            | 16 |  0.8667   |  0.9395  |
|       BlenderbotSmallForCausalLM        | 64 |  0.8172   |  0.9269  |
|        BertForQuestionAnswering         | 64 |  0.9315   |  0.9256  |
|       RobertaForQuestionAnswering       | 64 |  0.9315   |  0.9254  |
|      BartForConditionalGeneration       | 1  |  0.8619   |  0.881   |
|       AlbertForQuestionAnswering        | 2  |  0.6451   |  0.8636  |
|            MBartForCausalLM             | 16 |  0.8398   |  0.8565  |
|            AlbertForMaskedLM            | 2  |  0.6364   |  0.8515  |
|                 BigBird                 | 1  |  0.9513   |  0.8349  |
|     DistilBertForQuestionAnswering      | 32 |  0.8967   |  0.8334  |
|     PLBartForConditionalGeneration      | 8  |  0.8307   |  0.8251  |
|               DistillGPT2               | 1  |  0.7548   |  0.812   |
|          MobileBertForMaskedLM          | 16 |  0.8983   |  0.7803  |
|      MBartForConditionalGeneration      | 8  |  0.8187   |  0.7699  |
|            TrOCRForCausalLM             | 8  |  0.7955   |  0.7566  |
|                CamemBert                | 1  |  0.7872   |  0.7482  |
|             OPTForCausalLM              | 4  |  0.7501   |  0.7473  |
|            YituTechConvBert             | 1  |  0.7819   |  0.7407  |
|           PegasusForCausalLM            | 8  |  0.9444   |  0.7324  |
|           RobertaForCausalLM            | 4  |  0.7741   |  0.7309  |
|             XGLMForCausalLM             | 1  |  0.9992   |  0.7214  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8218   |  0.7107  |
|     PegasusForConditionalGeneration     | 4  |  0.9196   |  0.6769  |
|         MegatronBertForCausalLM         | 2  |  0.7726   |  0.6697  |
|     M2M100ForConditionalGeneration      | 2  |  0.9497   |  0.6568  |
|     MobileBertForQuestionAnswering      | 32 |  0.9796   |  0.6265  |
|       MT5ForConditionalGeneration       | 2  |  0.6019   |  0.6019  |
|           DebertaForMaskedLM            | 4  |  0.9826   |  0.4498  |
|       DebertaForQuestionAnswering       | 4  |  1.0568   |  0.3761  |
|          AllenaiLongformerBase          | 1  |  0.9477   |   nan    |
|           LayoutLMForMaskedLM           | 16 |  0.9238   |   nan    |
+-----------------------------------------+----+-----------+----------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      | 91%, 50/55 | 98%, 43/44  | 100%, 61/61 |
|   aot_eager    | 89%, 49/55 | 98%, 43/44  | 90%, 55/61  |
| aot_cudagraphs | 25%, 14/55 |  0%, 0/44   |  2%, 1/61   |
|  aot_nvfuser   | 58%, 32/55 |  2%, 1/44   | 82%, 50/61  |
|    inductor    | 84%, 46/55 | 93%, 41/44  | 95%, 58/61  |
+----------------+------------+-------------+-------------+

Geometric mean speedup

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   1.00x    |    1.01x    |    1.00x    |
|   aot_eager    |   1.01x    |    1.00x    |    1.00x    |
| aot_cudagraphs |   1.02x    |    0.0x     |    1.00x    |
|  aot_nvfuser   |   1.13x    |    1.12x    |    1.12x    |
|    inductor    |   1.39x    |    1.60x    |    1.21x    |
+----------------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |    5.42    |    14.22    |    11.34    |
|   aot_eager    |    9.77    |    21.16    |    16.79    |
| aot_cudagraphs |    4.86    |     0.0     |    7.42     |
|  aot_nvfuser   |   22.48    |    10.56    |    57.73    |
|    inductor    |   238.15   |   109.27    |   366.65    |
+----------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   0.95x    |    0.98x    |    1.00x    |
|   aot_eager    |   0.86x    |    0.89x    |    0.88x    |
| aot_cudagraphs |   0.41x    |    0.0x     |    0.25x    |
|  aot_nvfuser   |   0.83x    |    1.08x    |    0.85x    |
|    inductor    |   0.78x    |    0.74x    |    0.90x    |
+----------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            densenet121            |  4   | 0.9991 |  1.0087   |      0.0       |   1.4479    |  4.9052  |
|         timm_efficientdet         |  1   | 0.9824 |  0.8787   |      0.0       |     0.0     |  3.9475  |
|       functorch_dp_cifar10        |  64  | 0.9963 |  0.9772   |      0.0       |    1.197    |  3.6233  |
|      timm_vision_transformer      |  8   | 1.0025 |  0.9173   |      0.0       |   1.3464    |  2.5509  |
|                drq                |  1   | 1.0043 |  0.8529   |      0.0       |   1.0585    |  2.4584  |
|           BERT_pytorch            |  16  | 1.0078 |  0.8843   |      0.0       |     0.0     |  1.8656  |
|             resnet18              |  16  | 1.0033 |   1.104   |      0.0       |   1.3915    |  1.8125  |
|               dcgan               |  32  | 0.9844 |  1.0223   |     1.0738     |   1.1668    |  1.7591  |
|           lennard_jones           | 1000 | 0.9793 |  0.8541   |     1.062      |    1.027    |  1.7573  |
|          pytorch_struct           | 200  | 0.9964 |  0.7439   |     0.8929     |   0.8905    |  1.7547  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.0004 |  0.9318   |     1.1166     |   1.2026    |  1.7117  |
|             hf_Albert             |  8   | 1.0013 |  0.9975   |      0.0       |     0.0     |  1.6656  |
|           squeezenet1_1           |  32  | 1.0075 |  1.0042   |     0.9826     |   1.1641    |  1.6018  |
|          resnext50_32x4d          |  8   | 1.0038 |  1.0848   |      0.0       |    1.36     |  1.513   |
|        mobilenet_v3_large         |  32  | 1.0035 |  1.1165   |      0.0       |    1.397    |  1.4827  |
|            timm_nfnet             | 128  | 0.9995 |  0.9997   |      0.0       |    1.211    |  1.4715  |
|              hf_GPT2              |  4   | 1.0071 |  0.9753   |      0.0       |     0.0     |  1.4298  |
|            hf_T5_large            |  2   | 1.0245 |  0.8903   |      0.0       |     0.0     |  1.4073  |
|         soft_actor_critic         | 256  | 1.0007 |  0.7819   |     1.0121     |    1.045    |  1.377   |
|           fastNLP_Bert            |  6   | 0.9988 |  0.9748   |      0.0       |     0.0     |  1.3639  |
|              hf_Bart              |  4   | 1.0125 |  0.9707   |      0.0       |     0.0     |  1.2501  |
|          LearningToPaint          |  96  | 1.004  |  1.0612   |      0.0       |   1.2169    |  1.2114  |
|           pytorch_unet            |  1   |  1.0   |  0.9966   |      0.0       |   1.0756    |  1.2054  |
|            Super_SloMo            |  6   |  1.0   |  0.9973   |      0.0       |     0.0     |  1.1763  |
|               vgg16               |  64  | 0.9999 |  0.9982   |     0.7928     |   0.9965    |  1.1707  |
|              alexnet              | 128  | 0.9999 |  0.9985   |     0.7786     |    1.001    |  1.1615  |
|           hf_DistilBert           |  8   | 0.9997 |  0.9551   |      0.0       |     0.0     |  1.1572  |
|              hf_Bert              |  4   | 1.0291 |  0.9963   |      0.0       |     0.0     |  1.1565  |
|            mnasnet1_0             |  32  | 0.9998 |  1.1013   |     0.7453     |   1.3026    |  1.1524  |
|          pytorch_stargan          |  16  | 0.9992 |  0.9827   |     0.7293     |   1.0246    |  1.1189  |
|        Background_Matting         |  4   | 0.9997 |  1.0225   |      0.0       |   1.0825    |  1.1138  |
|            hf_Reformer            |  4   | 0.9965 |    0.0    |     0.8945     |     0.0     |  1.1108  |
|            hf_BigBird             |  2   | 0.9941 |  0.9398   |      0.0       |     0.0     |  1.0989  |
|         timm_efficientnet         |  32  | 0.961  |  0.8183   |      0.0       |   1.0739    |  1.0815  |
|        shufflenet_v2_x1_0         | 128  | 1.0008 |  1.0519   |      0.0       |   1.1884    |  1.0746  |
|   timm_vision_transformer_large   |  8   | 0.9999 |  0.9935   |      0.0       |    0.982    |  1.0531  |
| attention_is_all_you_need_pytorch | 256  | 0.9973 |  0.9715   |      0.0       |     0.0     |  1.0492  |
|           timm_resnest            |  32  |  1.0   |  1.0019   |      0.0       |   1.1832    |  1.0416  |
|            tts_angular            |  64  | 0.9854 |  0.9625   |     0.9851     |   1.0031    |  1.0091  |
|              demucs               |  4   | 1.0004 |  1.0005   |      1.0       |   1.0003    |  1.0002  |
|               dlrm                | 2048 | 0.904  |  0.8836   |      0.0       |     0.0     |  0.9304  |
|            timm_vovnet            |  32  | 0.9122 |  0.9055   |      0.0       |   0.9776    |  0.9152  |
|      nvidia_deeprecommender       | 256  | 0.9991 |  0.9632   |     0.5842     |   0.9441    |  0.9044  |
|           mobilenet_v2            |  96  | 0.9996 |  0.9986   |      0.0       |   1.0422    |  0.8514  |
|            timm_regnet            |  32  | 0.9654 |   0.964   |      0.0       |   1.0934    |  0.7595  |
|             resnet50              |  32  | 0.9986 |  0.9934   |      0.0       |   1.1608    |  0.7378  |
|              yolov3               |  16  | 0.9995 |  0.9944   |      0.0       |   1.1831    |   0.0    |
|               hf_T5               |  8   | 1.0018 |  0.9897   |      0.0       |     0.0     |   0.0    |
|           hf_GPT2_large           |  4   | 0.9999 |  0.9805   |      0.0       |     0.0     |   0.0    |
|        speech_transformer         |  32  | 1.0016 |  0.9211   |      0.0       |     0.0     |   0.0    |
|           hf_Longformer           |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|    mobilenet_v2_quantized_qat     |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|      resnet50_quantized_qat       |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|             tacotron2             |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|        Background_Matting         |  4  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          LearningToPaint          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            densenet121            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|                drq                |  1  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           mobilenet_v2            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           pytorch_unet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet18              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet50              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          resnext50_32x4d          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|         timm_efficientnet         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_regnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           timm_resnest            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|      timm_vision_transformer      |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_vovnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            Super_SloMo            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           fastNLP_Bert            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|             hf_Albert             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bert              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_BigBird             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_DistilBert           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|           hf_Longformer           |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|    mobilenet_v2_quantized_qat     |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|      resnet50_quantized_qat       |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|             tacotron2             |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|              yolov3               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      0.0000      |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|         timm_efficientdet         |  1   | 51.5105 |  69.1348  |      nan       |     nan     | 1570.859  |
|            densenet121            |  4   | 13.3562 |  24.7318  |      nan       |   99.5096   | 1347.1298 |
|            hf_T5_large            |  2   | 35.8088 |  65.6235  |      nan       |     nan     |  809.381  |
|            mnasnet1_0             |  32  | 3.1228  |  6.7735   |    25.4378     |   33.2738   | 734.4542  |
|        mobilenet_v3_large         |  32  | 3.5899  |  7.2011   |      nan       |   56.0625   | 684.1648  |
|           mobilenet_v2            |  96  | 3.0708  |  6.5693   |      nan       |   38.7425   | 544.2103  |
|          resnext50_32x4d          |  8   |  3.279  |  7.2643   |      nan       |   31.0494   |  528.878  |
|         timm_efficientnet         |  32  |  5.789  |  10.1753  |      nan       |   56.0369   |  448.183  |
|        shufflenet_v2_x1_0         | 128  | 3.5612  |  7.8244   |      nan       |   29.147    | 348.9814  |
|           squeezenet1_1           |  32  | 0.6208  |  1.2845   |     3.0094     |   4.8813    | 338.1927  |
|           timm_resnest            |  32  | 1.3141  |  3.3041   |      nan       |   35.9312   | 308.5375  |
|            timm_regnet            |  32  | 8.0883  |  13.8651  |      nan       |   53.6162   | 267.5171  |
|             resnet50              |  32  | 3.2783  |   7.13    |      nan       |   34.6079   | 248.2819  |
| attention_is_all_you_need_pytorch | 256  | 4.1896  |  9.9733   |      nan       |     nan     | 230.1729  |
|            timm_vovnet            |  32  | 2.8535  |  5.8937   |      nan       |   24.9685   | 195.5114  |
|      timm_vision_transformer      |  8   | 2.9706  |  6.1813   |      nan       |   11.1085   | 183.2258  |
|       functorch_dp_cifar10        |  64  | 0.7946  |  2.0622   |      nan       |   5.4202    | 182.6834  |
|             resnet18              |  16  | 0.9067  |  2.3716   |      nan       |   17.9107   | 168.1567  |
|   timm_vision_transformer_large   |  8   | 22.7519 |  33.4327  |      nan       |   44.0679   |  167.013  |
|           BERT_pytorch            |  16  | 4.7978  |  10.3795  |      nan       |     nan     |  158.981  |
|          LearningToPaint          |  96  |  0.954  |   2.385   |      nan       |   24.3582   | 138.1686  |
|              hf_Bart              |  4   | 7.1398  |  13.1698  |      nan       |     nan     | 127.2181  |
|          pytorch_stargan          |  16  | 0.7827  |  2.6368   |     9.467      |   4.3549    |  127.126  |
|           fastNLP_Bert            |  6   | 5.0298  |  9.6183   |      nan       |     nan     | 125.7024  |
|              hf_GPT2              |  4   | 3.4078  |  7.8094   |      nan       |     nan     | 123.6637  |
|        Background_Matting         |  4   |  3.704  |  7.2381   |      nan       |   32.3312   | 113.9399  |
|            timm_nfnet             | 128  |  6.556  |  11.458   |      nan       |   34.1762   | 101.3946  |
|          pytorch_struct           | 200  | 0.3956  |  0.9066   |     1.4291     |   4.2106    |  96.611   |
|             hf_Albert             |  8   | 1.0817  |  5.5058   |      nan       |     nan     |  70.3122  |
|              hf_Bert              |  4   | 5.0448  |  9.3782   |      nan       |     nan     |  66.6859  |
|            hf_Reformer            |  4   | 2.9659  |    nan    |     13.115     |     nan     |  65.8278  |
|            Super_SloMo            |  6   | 2.1407  |  5.7243   |      nan       |     nan     |  64.7196  |
|           pytorch_unet            |  1   | 1.0384  |  2.6731   |      nan       |   20.1145   |  55.6862  |
|            hf_BigBird             |  2   | 10.7708 |  16.2465  |      nan       |     nan     |  50.6759  |
|           hf_DistilBert           |  8   | 1.5586  |  3.8873   |      nan       |     nan     |  42.2735  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.7449  |  2.4894   |     7.836      |   4.0526    |  25.7952  |
|               vgg16               |  64  | 0.2968  |  0.7589   |     2.2805     |   2.5941    |  17.3894  |
|                drq                |  1   | 0.2544  |  0.5167   |      nan       |   3.5535    |  14.848   |
|               dcgan               |  32  | 0.2533  |  0.4967   |     1.1971     |   3.7861    |  13.547   |
|               dlrm                | 2048 | 0.6134  |  0.9344   |      nan       |     nan     |  13.3158  |
|              alexnet              | 128  | 0.2488  |  0.4892   |     1.1756     |   2.4427    |  11.1009  |
|      nvidia_deeprecommender       | 256  | 0.2717  |  0.4553   |     0.7334     |   2.4597    |  9.1377   |
|         soft_actor_critic         | 256  | 0.2537  |  0.3757   |     0.5877     |    1.579    |  7.5755   |
|           lennard_jones           | 1000 | 0.2162  |  0.3509   |     0.4982     |   1.1228    |  3.7301   |
|            tts_angular            |  64  | 0.3084  |  0.3561   |     0.481      |   1.0792    |  3.0467   |
|              demucs               |  4   |  0.808  |  0.8143   |     0.7953     |   0.7936    |  0.6963   |
|              yolov3               |  16  | 7.2561  |  12.5914  |      nan       |   47.2978   |    nan    |
|           hf_GPT2_large           |  4   | 20.5619 |  34.5505  |      nan       |     nan     |    nan    |
|        speech_transformer         |  32  | 7.0946  |  13.3612  |      nan       |     nan     |    nan    |
|               hf_T5               |  8   | 3.7283  |  10.4884  |      nan       |     nan     |    nan    |
|           hf_Longformer           |  0   |   nan   |    nan    |      nan       |     nan     |    nan    |
|    mobilenet_v2_quantized_qat     |  0   |   nan   |    nan    |      nan       |     nan     |    nan    |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |    nan    |
|      resnet50_quantized_qat       |  0   |   nan   |    nan    |      nan       |     nan     |    nan    |
|             tacotron2             |  0   |   nan   |    nan    |      nan       |     nan     |    nan    |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            Super_SloMo            |  6   | 1.0024 |   0.956   |      nan       |     nan     |  1.1855  |
|         timm_efficientnet         |  32  | 0.9998 |  0.7704   |      nan       |   0.7845    |  1.0652  |
|            timm_nfnet             | 128  | 0.9393 |   0.897   |      nan       |   0.9515    |  1.022   |
|         timm_efficientdet         |  1   | 1.0142 |  0.8251   |      nan       |     nan     |  1.0218  |
|           mobilenet_v2            |  96  | 0.9993 |  0.7661   |      nan       |   0.7676    |  0.9975  |
|              demucs               |  4   | 0.9886 |  0.9886   |     0.9886     |   0.9886    |  0.9886  |
|            tts_angular            |  64  | 0.9884 |  0.9884   |     0.984      |   0.9884    |  0.9842  |
|              hf_GPT2              |  4   | 0.9548 |   0.887   |      nan       |     nan     |  0.9505  |
|        Background_Matting         |  4   | 1.0026 |   0.952   |      nan       |   0.9773    |  0.9139  |
|          pytorch_stargan          |  16  | 0.9975 |   1.019   |     0.2027     |   1.0085    |  0.9023  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9986 |  0.9173   |     0.2326     |   0.9114    |  0.8941  |
|             hf_Albert             |  8   | 0.9333 |  0.9333   |      nan       |     nan     |  0.8804  |
|           pytorch_unet            |  1   | 0.9985 |  0.8536   |      nan       |    0.851    |  0.859   |
|              hf_Bart              |  4   | 0.9618 |  0.8786   |      nan       |     nan     |  0.8533  |
|              hf_Bert              |  4   | 0.9683 |  0.8952   |      nan       |     nan     |  0.8517  |
|            timm_regnet            |  32  | 1.0013 |  0.8634   |      nan       |   0.8806    |  0.8481  |
|        shufflenet_v2_x1_0         | 128  |  1.0   |  0.9163   |      nan       |   0.8868    |  0.8447  |
|           fastNLP_Bert            |  6   | 1.0012 |  0.9152   |      nan       |     nan     |  0.8343  |
| attention_is_all_you_need_pytorch | 256  | 0.9481 |  0.9241   |      nan       |     nan     |  0.8264  |
|            timm_vovnet            |  32  | 0.9933 |  0.7644   |      nan       |   0.7778    |  0.8252  |
|           BERT_pytorch            |  16  |  1.0   |  0.8995   |      nan       |     nan     |  0.825   |
|            hf_T5_large            |  2   | 0.922  |  0.8722   |      nan       |     nan     |  0.8237  |
|            hf_BigBird             |  2   | 0.9609 |  0.9609   |      nan       |     nan     |  0.8205  |
|           squeezenet1_1           |  32  | 0.9749 |  0.8159   |     0.2781     |   0.9742    |  0.8159  |
|           hf_DistilBert           |  8   | 0.9212 |  0.9053   |      nan       |     nan     |  0.7841  |
|               dcgan               |  32  |  1.0   |  0.7784   |     0.3321     |   0.7784    |  0.767   |
|              alexnet              | 128  | 0.9998 |  0.7731   |     0.3805     |   0.7736    |  0.743   |
|            mnasnet1_0             |  32  | 0.9988 |  0.9087   |     0.1627     |   0.8348    |  0.7268  |
|   timm_vision_transformer_large   |  8   | 1.0022 |  0.8433   |      nan       |   0.8015    |  0.7222  |
|      timm_vision_transformer      |  8   |  1.0   |  0.8883   |      nan       |   0.8108    |  0.712   |
|        mobilenet_v3_large         |  32  | 0.9958 |  0.8655   |      nan       |   0.8773    |  0.7041  |
|               dlrm                | 2048 | 0.7282 |  0.7283   |      nan       |     nan     |  0.6974  |
|           timm_resnest            |  32  | 0.9935 |  0.8862   |      nan       |   0.8075    |  0.6861  |
|             resnet50              |  32  | 1.0002 |  0.8763   |      nan       |   0.8011    |  0.6779  |
|            densenet121            |  4   |  1.0   |  0.8812   |      nan       |   0.8571    |  0.6618  |
|          resnext50_32x4d          |  8   | 0.9994 |  0.8687   |      nan       |   0.8223    |  0.6615  |
|               vgg16               |  64  |  1.0   |  0.6663   |     0.2532     |   0.6664    |  0.6471  |
|          LearningToPaint          |  96  | 0.9442 |  0.6896   |      nan       |   0.6279    |  0.6444  |
|         soft_actor_critic         | 256  | 0.964  |   0.964   |     0.4356     |   0.9555    |  0.6428  |
|                drq                |  1   | 0.8541 |  0.8541   |      nan       |   0.8541    |  0.6427  |
|             resnet18              |  16  | 0.9846 |  0.7907   |      nan       |   0.7038    |  0.6163  |
|           lennard_jones           | 1000 |  1.0   |    1.0    |     0.3712     |   1.0947    |  0.5646  |
|      nvidia_deeprecommender       | 256  | 0.5598 |  0.5598   |     0.4734     |   0.5598    |  0.5598  |
|          pytorch_struct           | 200  |  1.0   |  0.5079   |     0.4824     |   0.5079    |  0.4222  |
|       functorch_dp_cifar10        |  64  | 0.9626 |  0.8251   |      nan       |   0.8254    |  0.4037  |
|            hf_Reformer            |  4   | 0.3011 |    nan    |     0.1803     |     nan     |  0.299   |
|              yolov3               |  16  | 1.0072 |  0.8533   |      nan       |   0.8915    |   nan    |
|               hf_T5               |  8   | 0.9527 |  0.9446   |      nan       |     nan     |   nan    |
|        speech_transformer         |  32  | 0.9988 |  0.9152   |      nan       |     nan     |   nan    |
|           hf_GPT2_large           |  4   | 0.936  |  0.8771   |      nan       |     nan     |   nan    |
|           hf_Longformer           |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|    mobilenet_v2_quantized_qat     |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|      resnet50_quantized_qat       |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|             tacotron2             |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|           ElectraForCausalLM            | 1  | 1.0455 |  0.9436   |      0.0       |     0.0     |  4.4554  |
|       MT5ForConditionalGeneration       | 2  | 1.0242 |  0.9142   |      0.0       |     0.0     |  4.3332  |
|            YituTechConvBert             | 1  | 1.0303 |  0.9285   |      0.0       |     0.0     |  3.1989  |
|         MegatronBertForCausalLM         | 2  | 1.0434 |  0.9421   |      0.0       |     0.0     |  2.8533  |
|          MobileBertForMaskedLM          | 16 | 1.0189 |  0.8925   |      0.0       |     0.0     |  2.7208  |
|           RobertaForCausalLM            | 4  | 1.0444 |  0.9303   |      0.0       |     0.0     |  2.6843  |
|     M2M100ForConditionalGeneration      | 2  | 1.0415 |  0.8586   |      0.0       |     0.0     |  2.5929  |
|             OPTForCausalLM              | 4  | 1.0128 |   0.899   |      0.0       |     0.0     |  2.5804  |
|             XGLMForCausalLM             | 1  | 1.0142 |  0.8666   |      0.0       |     0.0     |  2.4853  |
|     MobileBertForQuestionAnswering      | 32 | 1.0192 |  0.9104   |      0.0       |     0.0     |  2.4298  |
|                CamemBert                | 1  | 1.0496 |   0.947   |      0.0       |     0.0     |  2.2877  |
|               DistillGPT2               | 1  | 1.0314 |  0.9338   |      0.0       |     0.0     |  2.1603  |
|               GoogleFnet                | 1  | 1.0024 |  0.8114   |      0.0       |   1.1191    |  2.1218  |
|     PegasusForConditionalGeneration     | 4  | 1.0119 |  0.8932   |      0.0       |     0.0     |  2.036   |
|     PLBartForConditionalGeneration      | 8  |  1.02  |  0.8992   |      0.0       |     0.0     |  1.7185  |
|      GPT2ForSequenceClassification      | 4  | 0.9989 |  0.9774   |      0.0       |     0.0     |  1.6752  |
|    MegatronBertForQuestionAnswering     | 8  | 1.044  |  0.9294   |      0.0       |     0.0     |  1.5913  |
|      MBartForConditionalGeneration      | 8  | 1.0148 |   0.907   |      0.0       |     0.0     |  1.4775  |
|            XLNetLMHeadModel             | 4  | 1.001  |  0.9642   |      0.0       |     0.0     |  1.4305  |
|       ElectraForQuestionAnswering       | 64 | 0.9991 |  0.9848   |      0.0       |     0.0     |  1.3727  |
|       T5ForConditionalGeneration        | 4  | 0.9985 |  0.9576   |      0.0       |     0.0     |  1.3585  |
|       AlbertForQuestionAnswering        | 2  | 1.0007 |  1.0019   |      0.0       |     0.0     |  1.306   |
|            AlbertForMaskedLM            | 2  | 1.0006 |  1.0012   |      0.0       |     0.0     |  1.3027  |
|    LayoutLMForSequenceClassification    | 16 | 0.9991 |  0.9881   |      0.0       |     0.0     |  1.2595  |
|       DebertaForQuestionAnswering       | 4  | 0.9311 |  0.7532   |     0.8024     |     0.0     |  1.2423  |
|            TrOCRForCausalLM             | 8  | 1.0121 |  0.9476   |      0.0       |     0.0     |  1.2385  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0111 |  0.9334   |      0.0       |     0.0     |   1.23   |
|                 T5Small                 | 1  | 1.0232 |  0.9484   |      0.0       |     0.0     |  1.2282  |
|      BartForConditionalGeneration       | 1  | 1.0119 |  0.9907   |      0.0       |     0.0     |  1.2215  |
|         Speech2Text2ForCausalLM         | 64 | 0.9987 |  0.9377   |      0.0       |     0.0     |  1.2182  |
|     DistilBertForQuestionAnswering      | 32 | 1.0257 |  0.9798   |      0.0       |     0.0     |  1.1967  |
|           PegasusForCausalLM            | 8  | 1.0098 |  0.9249   |      0.0       |     0.0     |  1.1947  |
|          DistilBertForMaskedLM          | 16 | 1.0274 |  0.9737   |      0.0       |     0.0     |  1.1673  |
|           LayoutLMForMaskedLM           | 16 | 0.9992 |  0.9693   |      0.0       |     0.0     |  1.1651  |
|             BartForCausalLM             | 2  | 0.9993 |  0.9663   |      0.0       |     0.0     |  1.1046  |
|            PLBartForCausalLM            | 16 | 1.0048 |  0.9436   |      0.0       |     0.0     |  1.1015  |
|       RobertaForQuestionAnswering       | 64 | 0.999  |  0.9832   |      0.0       |     0.0     |  1.0977  |
|        BertForQuestionAnswering         | 64 | 0.9989 |  0.9827   |      0.0       |     0.0     |  1.0964  |
|                 BigBird                 | 1  | 0.9913 |  0.9393   |      0.0       |     0.0     |  1.0871  |
|           DebertaForMaskedLM            | 4  | 0.9348 |  0.8095   |     0.7234     |     0.0     |  1.0837  |
|            MBartForCausalLM             | 16 | 1.0067 |  0.9635   |      0.0       |     0.0     |  1.053   |
|             BertForMaskedLM             | 64 | 0.9992 |  0.9608   |      0.0       |     0.0     |  1.042   |
|       BlenderbotSmallForCausalLM        | 64 | 1.0011 |   0.909   |      0.0       |     0.0     |  1.0115  |
|          AllenaiLongformerBase          | 0  |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+
|                  name                   | bs |    eager    |  aot_eager  | aot_cudagraphs | aot_nvfuser |  inductor   |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+
|               GoogleFnet                | 1  |    pass     |    pass     |  fail_to_run   |    pass     |    pass     |
|            AlbertForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       AlbertForQuestionAnswering        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             BartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|      BartForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             BertForMaskedLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|        BertForQuestionAnswering         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                 BigBird                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                CamemBert                | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           DebertaForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|          DistilBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     DistilBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|               DistillGPT2               | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           ElectraForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       ElectraForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|      GPT2ForSequenceClassification      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           LayoutLMForMaskedLM           | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|    LayoutLMForSequenceClassification    | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     M2M100ForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            MBartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       MT5ForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|         MegatronBertForCausalLM         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|          MobileBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     MobileBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             OPTForCausalLM              | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            PLBartForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           PegasusForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     PegasusForConditionalGeneration     | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           RobertaForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       RobertaForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|         Speech2Text2ForCausalLM         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       T5ForConditionalGeneration        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                 T5Small                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            TrOCRForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             XGLMForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            XLNetLMHeadModel             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            YituTechConvBert             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       DebertaForQuestionAnswering       | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |
|      MBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |
|          AllenaiLongformerBase          | 1  | fail_to_run | fail_to_run |  fail_to_run   | fail_to_run | fail_to_run |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|                  name                   | bs |  eager   | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|            XLNetLMHeadModel             | 4  | 17.5197  |  35.0716  |      nan       |     nan     | 305.9464 |
|          MobileBertForMaskedLM          | 16 | 135.2468 | 157.6587  |      nan       |     nan     | 274.7763 |
|     MobileBertForQuestionAnswering      | 32 | 129.6905 | 153.7345  |      nan       |     nan     | 255.9839 |
|     M2M100ForConditionalGeneration      | 2  | 25.8249  |  38.1287  |      nan       |     nan     | 214.2779 |
|       MT5ForConditionalGeneration       | 2  |  6.3337  |  16.2124  |      nan       |     nan     | 187.0089 |
|            YituTechConvBert             | 1  |  8.8852  |  16.1528  |      nan       |     nan     | 178.8949 |
|       T5ForConditionalGeneration        | 4  |  3.6747  |  10.5429  |      nan       |     nan     | 166.5406 |
|             XGLMForCausalLM             | 1  | 15.0955  |  24.4969  |      nan       |     nan     | 161.4696 |
|      MBartForConditionalGeneration      | 8  | 26.2231  |  38.8571  |      nan       |     nan     | 159.4234 |
|      BartForConditionalGeneration       | 1  | 25.4253  |  38.3882  |      nan       |     nan     | 158.0863 |
|     PegasusForConditionalGeneration     | 4  | 25.7842  |  37.4911  |      nan       |     nan     | 155.1584 |
|         MegatronBertForCausalLM         | 2  | 15.9751  |  25.4865  |      nan       |     nan     | 138.1979 |
|           DebertaForMaskedLM            | 4  |  6.9867  |  13.1786  |    49.9544     |     nan     | 137.1263 |
|    MegatronBertForQuestionAnswering     | 8  |  16.159  |  25.568   |      nan       |     nan     | 134.4903 |
|                 T5Small                 | 1  |  3.6926  |  10.3463  |      nan       |     nan     | 133.6301 |
|     PLBartForConditionalGeneration      | 8  |  7.2968  |  13.2318  |      nan       |     nan     | 131.7485 |
| BlenderbotSmallForConditionalGeneration | 32 | 11.8403  |  19.7241  |      nan       |     nan     | 118.8435 |
|       DebertaForQuestionAnswering       | 4  |  6.9042  |  13.4129  |    50.1618     |     nan     | 107.7468 |
|           RobertaForCausalLM            | 4  |  4.9535  |  9.7845   |      nan       |     nan     | 100.682  |
|    LayoutLMForSequenceClassification    | 16 |  5.1283  |  9.9782   |      nan       |     nan     | 92.4368  |
|           PegasusForCausalLM            | 8  |  9.7938  |  14.2359  |      nan       |     nan     |  86.443  |
|      GPT2ForSequenceClassification      | 4  |  3.5369  |  7.9757   |      nan       |     nan     | 79.3826  |
|             OPTForCausalLM              | 4  |  4.6514  |  9.1108   |      nan       |     nan     | 77.8106  |
|            MBartForCausalLM             | 16 |  9.8837  |  14.3594  |      nan       |     nan     | 77.6976  |
|       ElectraForQuestionAnswering       | 64 |  4.8714  |  9.4723   |      nan       |     nan     | 77.1258  |
|             BertForMaskedLM             | 64 |  4.9013  |  9.4034   |      nan       |     nan     | 75.8649  |
|             BartForCausalLM             | 2  |  9.7793  |  13.9953  |      nan       |     nan     | 74.4351  |
|           LayoutLMForMaskedLM           | 16 |  5.2202  |  9.9984   |      nan       |     nan     | 74.0493  |
|            TrOCRForCausalLM             | 8  |  9.6551  |  14.0138  |      nan       |     nan     | 67.6627  |
|       RobertaForQuestionAnswering       | 64 |  4.8562  |  9.3848   |      nan       |     nan     | 62.7951  |
|           ElectraForCausalLM            | 1  |  5.0751  |  9.6382   |      nan       |     nan     | 62.4936  |
|               DistillGPT2               | 1  |  1.4093  |  3.6037   |      nan       |     nan     |  62.393  |
|            PLBartForCausalLM            | 16 |  3.0927  |  5.4299   |      nan       |     nan     | 59.3231  |
|     DistilBertForQuestionAnswering      | 32 |  1.6682  |  3.9686   |      nan       |     nan     | 59.0738  |
|                CamemBert                | 1  |  4.9241  |   9.44    |      nan       |     nan     | 58.7884  |
|        BertForQuestionAnswering         | 64 |  4.9153  |  9.4155   |      nan       |     nan     | 58.1473  |
|       BlenderbotSmallForCausalLM        | 64 |  4.6979  |  7.7801   |      nan       |     nan     | 57.8001  |
|         Speech2Text2ForCausalLM         | 64 |  3.0594  |  5.3309   |      nan       |     nan     |  57.62   |
|            AlbertForMaskedLM            | 2  |  1.1999  |  5.7374   |      nan       |     nan     |  56.499  |
|                 BigBird                 | 1  | 10.7378  |  16.4461  |      nan       |     nan     | 53.5552  |
|          DistilBertForMaskedLM          | 16 |  1.7303  |   3.942   |      nan       |     nan     | 45.7724  |
|       AlbertForQuestionAnswering        | 2  |  1.1873  |  5.6661   |      nan       |     nan     | 38.8054  |
|               GoogleFnet                | 1  |  2.0873  |  4.2615   |      nan       |   10.5603   | 35.0482  |
|          AllenaiLongformerBase          | 0  |   nan    |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|      GPT2ForSequenceClassification      | 4  | 0.9342 |  0.9091   |      nan       |     nan     |  1.0318  |
|            XLNetLMHeadModel             | 4  | 1.0001 |  0.8976   |      nan       |     nan     |  0.9717  |
|       ElectraForQuestionAnswering       | 64 |  1.0   |  0.9524   |      nan       |     nan     |  0.9361  |
|    LayoutLMForSequenceClassification    | 16 |  1.0   |  0.9348   |      nan       |     nan     |  0.9339  |
|        BertForQuestionAnswering         | 64 |  1.0   |  0.9467   |      nan       |     nan     |  0.9145  |
|       RobertaForQuestionAnswering       | 64 |  1.0   |  0.9467   |      nan       |     nan     |  0.9145  |
|           LayoutLMForMaskedLM           | 16 |  1.0   |  0.9409   |      nan       |     nan     |  0.888   |
|                 T5Small                 | 1  |  1.0   |  0.9325   |      nan       |     nan     |  0.8445  |
|     DistilBertForQuestionAnswering      | 32 |  1.0   |  0.9046   |      nan       |     nan     |  0.8394  |
|             BertForMaskedLM             | 64 |  1.0   |  0.9219   |      nan       |     nan     |  0.8321  |
|             BartForCausalLM             | 2  |  1.0   |  0.8847   |      nan       |     nan     |  0.8303  |
|                 BigBird                 | 1  | 1.0001 |  0.9549   |      nan       |     nan     |  0.8224  |
|          DistilBertForMaskedLM          | 16 | 0.9998 |  0.9138   |      nan       |     nan     |  0.8055  |
|            PLBartForCausalLM            | 16 | 0.9997 |  0.8802   |      nan       |     nan     |  0.8028  |
|            MBartForCausalLM             | 16 |  1.0   |  0.8629   |      nan       |     nan     |  0.8005  |
|               DistillGPT2               | 1  | 1.0003 |  0.7721   |      nan       |     nan     |  0.7997  |
|         Speech2Text2ForCausalLM         | 64 |  1.0   |   0.88    |      nan       |     nan     |  0.7768  |
|       T5ForConditionalGeneration        | 4  |  1.0   |  0.9597   |      nan       |     nan     |  0.7754  |
|             XGLMForCausalLM             | 1  | 0.9999 |  0.9999   |      nan       |     nan     |  0.7728  |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8465   |      nan       |     nan     |  0.7708  |
| BlenderbotSmallForConditionalGeneration | 32 |  1.0   |  0.9036   |      nan       |     nan     |  0.7612  |
|     PLBartForConditionalGeneration      | 8  | 0.9997 |  0.8222   |      nan       |     nan     |  0.7547  |
|                CamemBert                | 1  | 0.998  |  0.7977   |      nan       |     nan     |  0.7369  |
|            YituTechConvBert             | 1  | 0.9858 |  0.7923   |      nan       |     nan     |  0.7298  |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.8048   |      nan       |     nan     |  0.7284  |
|       BlenderbotSmallForCausalLM        | 64 |  1.0   |  0.8401   |      nan       |     nan     |  0.7277  |
|      MBartForConditionalGeneration      | 8  |  1.0   |  0.8137   |      nan       |     nan     |  0.727   |
|             OPTForCausalLM              | 4  | 0.9979 |   0.75    |      nan       |     nan     |  0.714   |
|           RobertaForCausalLM            | 4  | 0.9058 |  0.7778   |      nan       |     nan     |  0.7099  |
|           PegasusForCausalLM            | 8  |  1.0   |  0.9323   |      nan       |     nan     |  0.7012  |
|    MegatronBertForQuestionAnswering     | 8  | 0.923  |  0.8265   |      nan       |     nan     |  0.6997  |
|               GoogleFnet                | 1  | 1.0003 |  0.9447   |      nan       |   1.0813    |  0.6953  |
|     M2M100ForConditionalGeneration      | 2  | 0.9797 |  0.9795   |      nan       |     nan     |  0.669   |
|         MegatronBertForCausalLM         | 2  | 0.7066 |  0.7066   |      nan       |     nan     |  0.6453  |
|     PegasusForConditionalGeneration     | 4  | 0.9721 |  0.9004   |      nan       |     nan     |  0.642   |
|       MT5ForConditionalGeneration       | 2  | 0.6173 |  0.6173   |      nan       |     nan     |  0.6173  |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.9369   |      nan       |     nan     |  0.6126  |
|           ElectraForCausalLM            | 1  |  1.0   |  0.9107   |      nan       |     nan     |  0.6123  |
|            AlbertForMaskedLM            | 2  | 0.9999 |  0.9172   |      nan       |     nan     |  0.6027  |
|          MobileBertForMaskedLM          | 16 | 0.9997 |  0.9179   |      nan       |     nan     |  0.5861  |
|     MobileBertForQuestionAnswering      | 32 |  1.0   |  0.9716   |      nan       |     nan     |  0.4668  |
|           DebertaForMaskedLM            | 4  |  1.0   |  0.9851   |     0.352      |     nan     |  0.4265  |
|       DebertaForQuestionAnswering       | 4  | 0.9845 |  1.0525   |     0.3277     |     nan     |  0.3569  |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|        res2net50_14w_8s         |  2  | 0.9996 |  1.0234   |      0.0       |   1.4428    |  4.931   |
|            hrnet_w18            |  2  | 1.0055 |  1.0607   |      0.0       |    1.445    |  4.5409  |
|           res2next50            |  2  | 1.0047 |  1.0349   |      0.0       |    1.371    |  4.3401  |
|         coat_lite_mini          | 128 | 0.9998 |  0.9998   |      0.0       |    1.075    |  1.7154  |
|          ghostnet_100           | 128 | 0.9985 |  0.9942   |      0.0       |   1.2476    |  1.6158  |
|        tnt_s_patch16_224        | 64  | 0.9998 |  0.9984   |      0.0       |   1.5639    |  1.4959  |
|           dm_nfnet_f0           | 128 | 0.9997 |  1.0002   |      0.0       |   1.2118    |  1.4722  |
|      xcit_large_24_p8_224       |  5  |  1.0   |  0.9889   |      0.0       |     0.0     |  1.4539  |
|        twins_pcpvt_base         | 32  | 1.0032 |  0.9692   |      0.0       |    1.347    |  1.439   |
|           volo_d1_224           | 64  | 0.9993 |  0.9943   |      0.0       |   1.1388    |  1.3985  |
|         crossvit_9_240          | 64  | 1.0059 |  0.9954   |      0.0       |   1.1391    |  1.3976  |
|            nfnet_l0             | 64  | 0.9978 |  0.7971   |      0.0       |   1.0534    |  1.3847  |
|          gmixer_24_224          | 64  | 0.9993 |  0.8424   |      0.0       |   0.9925    |  1.3623  |
|          jx_nest_base           | 32  | 0.999  |  0.9938   |      0.0       |    1.227    |  1.297   |
|            lcnet_050            | 128 | 0.9564 |  0.9487   |      0.0       |   1.4997    |  1.2591  |
|           convit_base           | 32  | 0.9991 |  0.9953   |      0.0       |   1.1951    |  1.2525  |
|            pit_b_224            | 64  | 0.9998 |  0.9985   |      0.0       |   1.0608    |  1.2184  |
|          cait_m36_384           |  2  | 0.998  |  0.8939   |      0.0       |   1.1053    |  1.1998  |
|          convnext_base          | 32  | 0.9992 |  0.9973   |      0.0       |   1.0444    |  1.1774  |
|  swin_base_patch4_window7_224   | 64  | 0.9995 |  0.9727   |      0.0       |   1.0031    |  1.1644  |
|          gmlp_s16_224           | 64  | 0.999  |  0.9963   |      0.0       |   0.9996    |  1.1519  |
|          inception_v3           | 128 | 0.9997 |  0.9979   |      0.0       |   1.1252    |  1.1406  |
|      beit_base_patch16_224      | 64  | 0.9997 |  0.9814   |      0.0       |   0.9546    |  1.1257  |
|        adv_inception_v3         | 128 | 1.0001 |  0.9962   |      0.0       |   1.1254    |  1.1106  |
| deit_base_distilled_patch16_224 | 64  | 0.9999 |  0.9992   |      0.0       |   1.0189    |  1.1089  |
|       gluon_inception_v3        | 128 | 0.9999 |  0.9975   |      0.0       |   1.1257    |  1.1079  |
|      vit_base_patch16_224       | 64  | 0.9998 |  0.9991   |      0.0       |   0.9792    |  1.1004  |
|         poolformer_m36          | 64  | 0.9994 |  0.9991   |      0.0       |    1.007    |  1.083   |
|           regnety_002           | 128 | 0.9801 |  0.9915   |      0.0       |   1.3543    |  1.0721  |
|          mixer_b16_224          | 64  | 0.9997 |  0.9974   |      0.0       |   0.9859    |  1.0501  |
|            mixnet_l             | 64  | 0.9707 |  0.8722   |      0.0       |   1.0056    |  1.0437  |
|          resmlp_12_224          | 128 | 0.9996 |  0.9993   |     0.6954     |     0.0     |  1.0218  |
|          pnasnet5large          | 16  | 0.9991 |   0.997   |      0.0       |   1.0824    |  1.0157  |
|             dla102              | 64  | 0.9992 |  0.9957   |      0.0       |   1.2867    |  1.0124  |
|           tf_mixnet_l           | 64  | 0.9718 |  0.8744   |      0.0       |   1.0055    |  1.0077  |
|            repvgg_a2            | 128 | 0.9637 |  0.9627   |      0.0       |   1.1205    |  1.0054  |
|           resnest101e           | 32  | 1.0021 |  1.0145   |      0.0       |   1.2074    |  0.9828  |
|             dpn107              | 32  | 0.9586 |  0.9507   |      0.0       |   1.0282    |  0.9726  |
|        convmixer_768_32         | 32  | 0.9998 |  0.9998   |      0.0       |   1.0604    |  0.9278  |
|        sebotnet33ts_256         | 64  | 0.9758 |  0.8075   |      0.0       |   1.0532    |  0.9189  |
|         visformer_small         | 128 |  1.0   |  1.0021   |      0.0       |   1.0216    |  0.901   |
|            gernet_l             | 128 | 0.9734 |  0.9723   |      0.0       |   1.0963    |  0.8829  |
|          cspdarknet53           | 64  | 0.9585 |   0.951   |      0.0       |    1.184    |  0.8741  |
|            fbnetv3_b            | 128 | 0.9649 |  0.9608   |      0.0       |   1.1338    |  0.8738  |
|           selecsls42b           | 128 | 0.9995 |  0.9983   |      0.0       |   1.2054    |  0.8712  |
|         mobilenetv2_100         | 128 | 0.9664 |   0.963   |      0.0       |   1.0139    |  0.8526  |
|           rexnet_100            | 128 | 0.9731 |  0.8155   |      0.0       |   0.9836    |  0.8392  |
|           mnasnet_100           | 128 | 0.9666 |   0.962   |      0.0       |   1.1567    |  0.8375  |
|            tinynet_a            | 128 | 0.966  |  0.7754   |      0.0       |   0.9707    |  0.8263  |
|           mobilevit_s           | 32  | 0.9762 |  0.7637   |      0.0       |   0.9654    |  0.8201  |
|      mobilenetv3_large_100      | 128 | 0.9652 |  0.9625   |      0.0       |   1.1648    |  0.8065  |
|        res2net101_26w_4s        | 64  | 0.9989 |  0.9976   |      0.0       |   1.1769    |  0.7997  |
|          spnasnet_100           | 128 | 0.9612 |  0.9562   |      0.0       |    1.138    |  0.7958  |
|           fbnetc_100            | 128 | 0.9661 |  0.9627   |      0.0       |   1.1865    |  0.7788  |
|     swsl_resnext101_32x16d      | 32  | 0.9994 |  0.9997   |      0.0       |   1.1076    |  0.7656  |
|        ese_vovnet19b_dw         | 128 | 0.9789 |  0.9774   |      0.0       |   1.1452    |  0.7575  |
|        eca_halonext26ts         | 64  | 0.9743 |  0.7768   |      0.0       |   1.0161    |  0.7388  |
|       tf_efficientnet_b0        | 128 | 0.9767 |  0.7832   |      0.0       |    0.985    |  0.7312  |
|        gluon_xception65         | 32  | 0.9994 |  0.9964   |      0.0       |   1.0393    |  0.6983  |
|       eca_botnext26ts_256       | 64  | 0.9738 |   0.77    |      0.0       |    1.018    |  0.689   |
|          botnet26t_256          | 128 | 0.9853 |  0.9848   |      0.0       |   1.2257    |  0.6747  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|          convnext_base          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          gmixer_24_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          gmlp_s16_224           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|        adv_inception_v3         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          botnet26t_256          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        convmixer_768_32         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          cspdarknet53           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dla102              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dpn107              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        eca_halonext26ts         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            gernet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          ghostnet_100           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       gluon_inception_v3        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            hrnet_w18            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          inception_v3           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            lcnet_050            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            mixnet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         mobilenetv2_100         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           mobilevit_s           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            nfnet_l0             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          pnasnet5large          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           regnety_002           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net101_26w_4s        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net50_14w_8s         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           res2next50            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           rexnet_100            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           selecsls42b           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           tf_mixnet_l           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            tinynet_a            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         visformer_small         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      vit_base_patch16_224       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |
|      xcit_large_24_p8_224       | 2  | pass  | fail_accuracy |  fail_to_run   |  fail_to_run  |     pass      |
|        gluon_xception65         | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|         poolformer_m36          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|         coat_lite_mini          | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            pit_b_224            | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      | fail_accuracy |
|            fbnetv3_b            | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|           resnest101e           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy | fail_accuracy |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|              name               | bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|            hrnet_w18            |  2  | 98.6712 | 129.1724  |      nan       |  304.0537   | 1285.4743 |
|             dpn107              | 32  | 13.3754 |   24.29   |      nan       |   87.4317   | 1191.6382 |
|          pnasnet5large          | 16  | 58.9642 |  81.7977  |      nan       |  186.1947   | 1076.8827 |
|           rexnet_100            | 128 | 6.2383  |  11.638   |      nan       |  106.3987   | 888.1642  |
|        res2net50_14w_8s         |  2  | 19.6937 |  32.6428  |      nan       |   87.1574   | 876.3242  |
|       eca_botnext26ts_256       | 64  | 2.4388  |  6.0508   |      nan       |   50.3269   | 774.3202  |
|           mobilevit_s           | 32  | 5.8551  |  10.7127  |      nan       |   45.7644   |  725.689  |
|            mixnet_l             | 64  | 13.541  |  19.9113  |      nan       |   71.4098   | 702.8912  |
|          ghostnet_100           | 128 | 8.7886  |  15.5764  |      nan       |   65.8406   | 629.9192  |
|            tinynet_a            | 128 | 7.4538  |  12.9979  |      nan       |   67.1415   | 600.5069  |
|        twins_pcpvt_base         | 32  | 25.4752 |  36.4329  |      nan       |   68.4858   | 573.9034  |
|           fbnetc_100            | 128 | 5.4489  |  10.0964  |      nan       |   49.0758   | 573.6376  |
|            fbnetv3_b            | 128 | 13.1373 |  19.9338  |      nan       |   85.3399   | 548.3581  |
|           resnest101e           | 32  | 26.1724 |  39.6525  |      nan       |   99.9511   |  546.489  |
|         coat_lite_mini          | 128 | 2.9903  |  6.7801   |      nan       |   16.3836   | 537.8709  |
|  swin_base_patch4_window7_224   | 64  | 12.1773 |  22.2394  |      nan       |   67.7967   | 477.8056  |
|             dla102              | 64  | 10.5032 |  18.7199  |      nan       |   71.5497   | 474.3166  |
|           res2next50            |  2  | 7.2348  |  14.1651  |      nan       |   47.9668   | 459.3931  |
|           tf_mixnet_l           | 64  | 13.5475 |  20.765   |      nan       |   70.0162   | 428.7479  |
|        sebotnet33ts_256         | 64  | 3.7255  |  8.1592   |      nan       |   53.8094   | 428.4917  |
|          cspdarknet53           | 64  | 6.0545  |  10.9249  |      nan       |   52.1622   | 427.7586  |
|          botnet26t_256          | 128 | 2.2674  |  5.3433   |      nan       |   42.1672   | 410.3641  |
|        eca_halonext26ts         | 64  | 2.5674  |  6.2787   |      nan       |   52.114    | 395.3512  |
|           mnasnet_100           | 128 | 3.9657  |  7.7173   |      nan       |   40.1768   | 388.1006  |
|        res2net101_26w_4s        | 64  | 24.9464 |  39.8152  |      nan       |  104.8715   |  378.42   |
|         mobilenetv2_100         | 128 | 4.0834  |   7.496   |      nan       |   40.6293   | 374.3737  |
|       tf_efficientnet_b0        | 128 | 5.6664  |  10.385   |      nan       |   65.9432   | 358.0571  |
|        adv_inception_v3         | 128 | 8.1852  |  15.9016  |      nan       |   74.9539   | 356.0919  |
|        ese_vovnet19b_dw         | 128 | 1.8982  |  3.9436   |      nan       |   32.5785   | 328.4316  |
|          convnext_base          | 32  | 11.4923 |  16.096   |      nan       |   31.6831   | 316.8313  |
|      xcit_large_24_p8_224       |  5  | 37.1187 |  51.4882  |      nan       |     nan     | 312.4874  |
|           regnety_002           | 128 | 4.8749  |  8.8489   |      nan       |   49.9733   | 305.9002  |
|      mobilenetv3_large_100      | 128 |  4.355  |  7.9916   |      nan       |   67.5523   | 277.8999  |
|        gluon_xception65         | 32  | 15.6406 |  24.5945  |      nan       |   55.6431   | 276.5347  |
|          cait_m36_384           |  2  | 46.5219 |  63.937   |      nan       |   91.4589   | 274.0069  |
|         visformer_small         | 128 | 2.2571  |  5.5074   |      nan       |   25.7864   | 272.1672  |
|          jx_nest_base           | 32  | 9.6565  |  16.9306  |      nan       |   66.5693   | 265.8097  |
|         crossvit_9_240          | 64  | 7.3287  |  13.4188  |      nan       |   32.8212   | 244.1634  |
|            gernet_l             | 128 | 4.6462  |  9.0919   |      nan       |   39.2841   | 219.5804  |
|           selecsls42b           | 128 | 2.3756  |  5.4282   |      nan       |   41.0006   | 215.9729  |
|         poolformer_m36          | 64  | 13.0446 |  19.5598  |      nan       |   34.3082   | 213.2786  |
|            lcnet_050            | 128 | 1.9507  |   3.999   |      nan       |   31.8533   |  204.466  |
|          spnasnet_100           | 128 | 5.4029  |  10.0674  |      nan       |   47.1409   |  203.189  |
|     swsl_resnext101_32x16d      | 32  | 10.026  |  18.4376  |      nan       |   48.6326   | 183.7595  |
|       gluon_inception_v3        | 128 | 8.2025  |  15.4166  |      nan       |   75.2205   | 175.9461  |
|           convit_base           | 32  | 3.9184  |  8.4475   |      nan       |   20.8037   | 167.9331  |
|          inception_v3           | 128 |  8.39   |  15.5865  |      nan       |   74.8391   | 166.4901  |
|           volo_d1_224           | 64  | 6.7498  |  12.7145  |      nan       |   32.6013   |  159.846  |
|          gmlp_s16_224           | 64  |  9.095  |  13.8868  |      nan       |   20.8052   | 139.0742  |
|            pit_b_224            | 64  | 3.6357  |  7.2231   |      nan       |   15.0088   | 135.8108  |
|        tnt_s_patch16_224        | 64  | 11.9392 |  20.1264  |      nan       |   34.0659   | 128.6679  |
|          gmixer_24_224          | 64  | 8.3774  |  13.6336  |      nan       |   23.8974   | 121.5108  |
|            repvgg_a2            | 128 | 4.5668  |  8.8221   |      nan       |   47.0965   | 109.2737  |
|          resmlp_12_224          | 128 | 2.7829  |  4.8097   |     7.4167     |     nan     |  95.8777  |
|            nfnet_l0             | 64  | 5.7948  |  11.1862  |      nan       |   31.0786   |  88.0471  |
|           dm_nfnet_f0           | 128 | 6.4796  |  11.5027  |      nan       |   34.4834   |  85.9462  |
|          mixer_b16_224          | 64  | 2.6586  |  5.0283   |      nan       |   12.7966   |  84.8839  |
|        convmixer_768_32         | 32  | 6.9275  |  11.7534  |      nan       |   19.4368   |  78.6739  |
|      beit_base_patch16_224      | 64  | 4.4367  |  9.0898   |      nan       |   17.357    |  67.4849  |
| deit_base_distilled_patch16_224 | 64  | 3.0194  |  6.2321   |      nan       |   12.536    |  67.4842  |
|      vit_base_patch16_224       | 64  | 2.8662  |  6.0204   |      nan       |   11.4566   |  57.186   |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|          gmixer_24_224          | 64  | 0.9992 |  0.9684   |      nan       |   0.9825    |  1.3808  |
|            nfnet_l0             | 64  | 1.0008 |  0.8298   |      nan       |    0.813    |  1.2555  |
|          pnasnet5large          | 16  | 1.069  |   1.011   |      nan       |   1.2062    |  1.1783  |
|            tinynet_a            | 128 |  1.0   |  0.7831   |      nan       |   0.7845    |  1.1735  |
|           rexnet_100            | 128 | 0.9992 |  0.7879   |      nan       |    0.871    |  1.1072  |
|           convit_base           | 32  | 1.0001 |  0.8879   |      nan       |   0.9506    |  1.068   |
|           dm_nfnet_f0           | 128 | 0.9393 |   0.897   |      nan       |   0.9515    |  1.022   |
|         mobilenetv2_100         | 128 | 0.9998 |  0.7664   |      nan       |   0.7679    |  1.0051  |
|             dla102              | 64  | 0.9881 |  0.9181   |      nan       |   0.9541    |  1.0011  |
|           mobilevit_s           | 32  | 0.9999 |  0.7692   |      nan       |   0.7431    |  1.0011  |
|         poolformer_m36          | 64  | 1.0003 |  0.9533   |      nan       |   0.9368    |  0.9734  |
|        eca_halonext26ts         | 64  | 0.9938 |  0.7717   |      nan       |   0.7731    |  0.9711  |
|       eca_botnext26ts_256       | 64  |  1.0   |  0.7705   |      nan       |   0.7679    |  0.9703  |
|           tf_mixnet_l           | 64  | 1.0001 |   0.861   |      nan       |   0.8605    |  0.9698  |
|        convmixer_768_32         | 32  |  1.0   |  0.9868   |      nan       |   0.9807    |  0.9656  |
|          cait_m36_384           |  2  | 1.0001 |  0.9024   |      nan       |   0.9202    |  0.9451  |
|       tf_efficientnet_b0        | 128 | 0.9998 |  0.7727   |      nan       |   0.8426    |  0.9413  |
|          mixer_b16_224          | 64  | 0.9956 |  0.9615   |      nan       |   0.8644    |  0.9357  |
|      beit_base_patch16_224      | 64  |  1.0   |  0.9575   |      nan       |   0.8606    |  0.9272  |
|          gmlp_s16_224           | 64  |  1.0   |  0.9766   |      nan       |    0.966    |  0.9267  |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9469   |      nan       |   0.8229    |  0.915   |
|        tnt_s_patch16_224        | 64  | 1.0001 |  0.9752   |      nan       |   0.8518    |  0.9131  |
|           volo_d1_224           | 64  | 0.9999 |  0.9247   |      nan       |   0.7472    |  0.9124  |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9476   |      nan       |   0.8242    |  0.9095  |
|          spnasnet_100           | 128 | 1.0005 |  0.9207   |      nan       |   0.8496    |  0.9024  |
|           selecsls42b           | 128 | 0.9883 |  0.8982   |      nan       |   0.9039    |  0.8999  |
|            mixnet_l             | 64  | 0.9995 |  0.8486   |      nan       |   0.7938    |  0.8993  |
|      mobilenetv3_large_100      | 128 | 1.0002 |  0.8686   |      nan       |   0.8819    |  0.8982  |
|        adv_inception_v3         | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |  0.8977  |
|       gluon_inception_v3        | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |  0.8977  |
|          inception_v3           | 128 | 1.0002 |  0.8694   |      nan       |    0.88     |  0.8977  |
|      xcit_large_24_p8_224       |  5  | 0.9999 |  0.9206   |      nan       |     nan     |  0.8952  |
|           resnest101e           | 32  |  1.0   |  0.9458   |      nan       |   0.9449    |  0.8922  |
|          ghostnet_100           | 128 | 0.9998 |  0.8872   |      nan       |    0.947    |  0.8889  |
|         visformer_small         | 128 | 0.9943 |  0.9442   |      nan       |   0.9475    |  0.8883  |
|          convnext_base          | 32  | 1.0001 |  0.9077   |      nan       |   0.7678    |  0.8853  |
|            fbnetv3_b            | 128 | 0.9995 |  0.7866   |      nan       |   0.7861    |  0.8837  |
|        gluon_xception65         | 32  | 0.9999 |  0.9384   |      nan       |   0.9001    |  0.8834  |
|             dpn107              | 32  | 0.9997 |  0.9285   |      nan       |   0.8949    |  0.8762  |
|        twins_pcpvt_base         | 32  | 1.0002 |  0.9127   |      nan       |   0.8351    |  0.8723  |
|          cspdarknet53           | 64  |  1.0   |  0.8562   |      nan       |   0.8797    |  0.8624  |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9309   |      nan       |    0.83     |  0.8586  |
|          jx_nest_base           | 32  | 1.0017 |   0.898   |      nan       |   0.7112    |  0.8574  |
|        ese_vovnet19b_dw         | 128 | 0.9999 |  0.8938   |      nan       |   0.9369    |  0.8467  |
|        sebotnet33ts_256         | 64  |  1.0   |  0.7109   |      nan       |   0.6852    |  0.841   |
|     swsl_resnext101_32x16d      | 32  | 1.0003 |  0.8983   |      nan       |   0.8684    |  0.8402  |
|          resmlp_12_224          | 128 | 0.9893 |  0.9525   |     0.2479     |     nan     |  0.8169  |
|        res2net101_26w_4s        | 64  | 1.0001 |  0.9307   |      nan       |   0.8959    |  0.8167  |
|         crossvit_9_240          | 64  | 1.0001 |  0.8721   |      nan       |    0.729    |  0.8108  |
|           mnasnet_100           | 128 | 1.0003 |  0.9126   |      nan       |   0.8368    |  0.7984  |
|            pit_b_224            | 64  | 0.9992 |  0.7962   |      nan       |   0.6417    |  0.7921  |
|         coat_lite_mini          | 128 | 1.0049 |  0.8826   |      nan       |   0.7873    |   0.79   |
|            lcnet_050            | 128 | 1.0005 |  0.7721   |      nan       |   0.7722    |  0.7579  |
|           regnety_002           | 128 | 0.9981 |   0.829   |      nan       |   0.7759    |  0.7465  |
|            gernet_l             | 128 |  1.0   |  0.7965   |      nan       |   0.8012    |  0.727   |
|          botnet26t_256          | 128 |  1.0   |  0.8494   |      nan       |   0.7497    |  0.7254  |
|           fbnetc_100            | 128 | 0.9998 |  0.8597   |      nan       |   0.7507    |  0.7246  |
|            hrnet_w18            |  2  | 0.9986 |  0.8792   |      nan       |   0.8869    |  0.6089  |
|           res2next50            |  2  |  1.0   |  0.8353   |      nan       |   0.8404    |  0.6063  |
|        res2net50_14w_8s         |  2  |  1.0   |  0.8387   |      nan       |   0.8474    |  0.5877  |
|            repvgg_a2            | 128 | 1.0003 |  0.8145   |      nan       |   0.6633    |  0.536   |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Performance graphs

see more

bench_logs/timm_models_float32.png :

bench_logs/huggingface_float32.png :

bench_logs/torchbench_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      | 92%, 49/53 | 98%, 42/43  | 100%, 61/61 |
|   aot_eager    | 94%, 50/53 | 98%, 42/43  | 90%, 55/61  |
| aot_cudagraphs | 26%, 14/53 |  0%, 0/43   |  11%, 7/61  |
|  aot_nvfuser   | 60%, 32/53 |  0%, 0/43   | 75%, 46/61  |
|    inductor    | 81%, 43/53 | 93%, 40/43  | 93%, 57/61  |
+----------------+------------+-------------+-------------+

Geometric mean speedup

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   1.00x    |    1.01x    |    1.00x    |
|   aot_eager    |   1.01x    |    1.00x    |    1.00x    |
| aot_cudagraphs |   1.09x    |    0.0x     |    1.00x    |
|  aot_nvfuser   |   1.16x    |    0.0x     |    1.19x    |
|    inductor    |   1.71x    |    2.29x    |    1.31x    |
+----------------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |    5.68    |    14.88    |    11.61    |
|   aot_eager    |   11.54    |    25.28    |    19.45    |
| aot_cudagraphs |    7.10    |     0.0     |    52.59    |
|  aot_nvfuser   |   29.15    |     0.0     |    78.59    |
|    inductor    |   215.79   |   112.71    |   397.63    |
+----------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   0.96x    |    0.98x    |    1.00x    |
|   aot_eager    |   0.86x    |    0.87x    |    0.88x    |
| aot_cudagraphs |   0.44x    |    0.0x     |    0.20x    |
|  aot_nvfuser   |   0.83x    |    0.0x     |    0.85x    |
|    inductor    |   0.77x    |    0.82x    |    0.89x    |
+----------------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            densenet121            |  4   | 0.9967 |  0.9096   |      0.0       |   1.3892    |  4.9549  |
|       functorch_dp_cifar10        |  64  | 1.0015 |   0.917   |      0.0       |   1.1984    |  4.7933  |
|         timm_efficientdet         |  1   | 0.9846 |  0.8101   |      0.0       |     0.0     |  4.1584  |
|      timm_vision_transformer      |  8   | 1.0027 |  0.8733   |      0.0       |   1.3673    |  3.1169  |
|           BERT_pytorch            |  16  | 1.0109 |  0.8357   |      0.0       |     0.0     |  3.0625  |
|                drq                |  1   | 0.9999 |  0.7995   |      0.0       |   1.1068    |  3.0176  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9968 |   0.91    |     1.3034     |   1.2183    |  2.6191  |
|             resnet18              |  16  | 1.0025 |  0.9953   |      0.0       |     1.3     |  2.6171  |
|               dcgan               |  32  | 0.9936 |  0.9007   |     1.1355     |   0.7427    |  2.5545  |
|             hf_Albert             |  8   | 1.0004 |  0.9555   |      0.0       |     0.0     |  2.3929  |
|          pytorch_struct           | 200  | 0.9898 |  0.7374   |     1.0146     |   1.0055    |  2.2997  |
|            hf_T5_large            |  2   | 1.0167 |  0.8597   |      0.0       |     0.0     |  2.2877  |
|           squeezenet1_1           |  32  | 0.9992 |  0.9618   |     1.3556     |   1.1931    |  2.2867  |
|          resnext50_32x4d          |  8   | 0.9987 |  0.9506   |      0.0       |   1.3205    |  2.1799  |
|               hf_T5               |  8   | 0.998  |  0.9443   |      0.0       |     0.0     |  2.1468  |
|              hf_Bart              |  4   | 1.0159 |  0.8354   |      0.0       |     0.0     |  2.0957  |
|           lennard_jones           | 1000 | 0.9747 |  0.7475   |     1.2902     |   1.0387    |  2.0271  |
|              hf_GPT2              |  4   | 1.0204 |  0.8863   |      0.0       |     0.0     |   2.02   |
|        mobilenet_v3_large         |  32  | 1.0041 |  1.0119   |      0.0       |   1.4094    |  2.0149  |
|            mnasnet1_0             |  32  | 0.9976 |  1.0169   |     0.8904     |    1.406    |  1.9422  |
|              hf_Bert              |  4   |  1.03  |  0.8543   |      0.0       |     0.0     |  1.8929  |
|          LearningToPaint          |  96  | 1.0076 |  0.9997   |      0.0       |   1.3645    |  1.8479  |
|         timm_efficientnet         |  32  | 0.9601 |  0.8092   |      0.0       |   1.1852    |  1.7838  |
| attention_is_all_you_need_pytorch | 256  | 1.0074 |  0.9148   |      0.0       |     0.0     |  1.4975  |
|           hf_DistilBert           |  8   | 1.0018 |  0.9687   |      0.0       |     0.0     |  1.4794  |
|           fastNLP_Bert            |  6   | 0.9995 |  0.8925   |      0.0       |     0.0     |  1.4599  |
|         soft_actor_critic         | 256  | 1.0107 |  0.7358   |     1.2675     |   1.0585    |  1.4593  |
|           pytorch_unet            |  1   | 0.9995 |  0.9929   |      0.0       |    1.156    |  1.3537  |
|            timm_nfnet             | 128  | 0.9999 |  0.9989   |      0.0       |   1.1729    |  1.3393  |
|          pytorch_stargan          |  16  | 0.9983 |  1.0276   |     0.8223     |   1.0905    |  1.3171  |
|            Super_SloMo            |  6   | 0.9997 |   0.996   |      0.0       |     0.0     |  1.2922  |
|        shufflenet_v2_x1_0         | 128  | 1.0003 |  1.0161   |      0.0       |   1.3367    |  1.2891  |
|               vgg16               |  64  | 0.9998 |  0.9977   |     0.7983     |    0.995    |  1.2721  |
|        Background_Matting         |  4   | 0.9992 |  1.0179   |      0.0       |   1.1151    |  1.2162  |
|              alexnet              | 128  | 0.9996 |  0.9969   |     0.7886     |   1.0039    |  1.2097  |
|   timm_vision_transformer_large   |  8   | 0.9991 |  0.9898   |      0.0       |   0.9925    |  1.1592  |
|            hf_Reformer            |  4   | 0.996  |  0.9993   |     0.9194     |     0.0     |  1.1587  |
|            hf_BigBird             |  2   | 0.9915 |  0.9188   |      0.0       |     0.0     |  1.1515  |
|           timm_resnest            |  32  | 1.0016 |   1.021   |      0.0       |   1.3137    |  1.1411  |
|            timm_vovnet            |  32  | 0.9188 |  0.8863   |      0.0       |   1.1259    |  1.0895  |
|            tts_angular            |  64  | 0.9927 |  0.9561   |     1.013      |   1.0041    |  1.013   |
|              demucs               |  4   | 0.9976 |  1.0028   |     1.0002     |   0.9991    |  0.9992  |
|      nvidia_deeprecommender       | 256  | 0.9993 |  0.9958   |     0.6964     |   0.9789    |  0.9894  |
|           mobilenet_v2            |  96  | 0.9989 |  0.9868   |      0.0       |   0.9239    |  0.9572  |
|             resnet50              |  32  | 0.9998 |  1.0112   |      0.0       |   1.3797    |  0.8611  |
|            timm_regnet            |  32  | 0.9796 |  0.9386   |      0.0       |    1.183    |  0.7142  |
|              yolov3               |  16  | 0.9991 |  0.9882   |      0.0       |   0.9225    |   0.0    |
|               dlrm                | 2048 |  0.0   |  1.2114   |      0.0       |     0.0     |   0.0    |
|           hf_GPT2_large           |  4   | 0.9994 |  0.9897   |      0.0       |     0.0     |   0.0    |
|        speech_transformer         |  32  | 1.0051 |   0.841   |      0.0       |     0.0     |   0.0    |
|           hf_Longformer           |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|             tacotron2             |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|        Background_Matting         |  4  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          LearningToPaint          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            densenet121            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|                drq                |  1  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           mobilenet_v2            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           pytorch_unet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet18              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet50              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          resnext50_32x4d          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|         timm_efficientnet         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_regnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           timm_resnest            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|      timm_vision_transformer      |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_vovnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            Super_SloMo            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           fastNLP_Bert            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|             hf_Albert             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bert              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_BigBird             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_DistilBert           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|           hf_Longformer           |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|             tacotron2             |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |  fail_accuracy   |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |      0.0000      |
|              yolov3               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      0.0000      |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+
|         timm_efficientdet         |  1   | 51.6778 |  76.315   |      nan       |     nan     | 1561.4016 |
|            densenet121            |  4   | 13.5643 |  28.592   |      nan       |  137.0015   | 1284.8246 |
|            hf_T5_large            |  2   | 35.6597 |  74.143   |      nan       |     nan     | 1105.8679 |
|            mnasnet1_0             |  32  | 3.2898  |  8.4607   |     42.168     |   45.5088   | 759.1611  |
|        mobilenet_v3_large         |  32  | 3.7676  |  9.0181   |      nan       |   74.5093   | 688.5086  |
|           mobilenet_v2            |  96  | 3.2506  |  8.2172   |      nan       |   42.8435   | 560.8749  |
|          resnext50_32x4d          |  8   | 3.4647  |   8.893   |      nan       |   39.1118   | 547.4423  |
|         timm_efficientnet         |  32  | 5.8589  |  12.2274  |      nan       |   72.3203   | 415.9198  |
|        shufflenet_v2_x1_0         | 128  | 3.7576  |  9.3836   |      nan       |   40.0699   | 357.9817  |
|           squeezenet1_1           |  32  | 0.6616  |   1.757   |     6.8976     |   6.7878    | 340.2634  |
|           timm_resnest            |  32  | 1.3961  |  4.1828   |      nan       |   42.9349   | 303.1367  |
|            timm_nfnet             | 128  | 6.7717  |  13.0572  |      nan       |   41.5275   | 293.2979  |
|             resnet50              |  32  | 3.4893  |  8.8065   |      nan       |   43.3898   | 266.9493  |
|            timm_regnet            |  32  | 8.3816  |  16.5065  |      nan       |   65.8429   | 250.7005  |
| attention_is_all_you_need_pytorch | 256  | 4.3319  |  12.2357  |      nan       |     nan     | 227.2056  |
|            timm_vovnet            |  32  | 2.9802  |   7.155   |      nan       |   31.7436   | 193.0371  |
|   timm_vision_transformer_large   |  8   | 22.5208 |  39.1907  |      nan       |   57.9043   | 179.7974  |
|       functorch_dp_cifar10        |  64  | 0.8221  |  2.5554   |      nan       |   6.2734    | 173.3004  |
|      timm_vision_transformer      |  8   | 3.1139  |   7.985   |      nan       |   15.9315   | 168.9022  |
|             resnet18              |  16  | 1.0142  |  3.0134   |      nan       |   23.5973   | 160.1865  |
|           BERT_pytorch            |  16  | 5.0367  |  13.1083  |      nan       |     nan     | 155.0612  |
|          LearningToPaint          |  96  | 1.0619  |  3.0807   |      nan       |   30.7452   | 147.1359  |
|               hf_T5               |  8   | 3.9756  |  12.0963  |      nan       |     nan     | 136.5616  |
|          pytorch_stargan          |  16  | 0.8302  |  3.1387   |     11.232     |   7.4876    | 129.5533  |
|        Background_Matting         |  4   | 4.0333  |   9.316   |      nan       |   46.2135   | 126.0886  |
|           fastNLP_Bert            |  6   | 5.3273  |  12.3503  |      nan       |     nan     | 123.1903  |
|              hf_Bart              |  4   | 7.4307  |  16.7695  |      nan       |     nan     | 118.9238  |
|              hf_GPT2              |  4   | 3.6314  |  9.7759   |      nan       |     nan     | 116.4341  |
|          pytorch_struct           | 200  | 0.4314  |  1.2309   |     1.8299     |   5.3945    | 104.9248  |
|            Super_SloMo            |  6   | 2.2312  |  6.7703   |      nan       |     nan     |  71.7895  |
|             hf_Albert             |  8   | 1.4443  |  8.1906   |      nan       |     nan     |  67.9052  |
|              hf_Bert              |  4   | 5.1953  |  12.1324  |      nan       |     nan     |  63.7948  |
|            hf_Reformer            |  4   | 3.0625  |  5.7244   |    13.5967     |     nan     |  59.1986  |
|            hf_BigBird             |  2   | 11.5114 |  19.9501  |      nan       |     nan     |  58.435   |
|           pytorch_unet            |  1   | 1.1142  |  3.3664   |      nan       |   26.4582   |  47.7954  |
|           hf_DistilBert           |  8   | 1.7462  |  5.1916   |      nan       |     nan     |  42.315   |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.7671  |  3.1052   |    11.8215     |   5.0147    |  26.9302  |
|               vgg16               |  64  | 0.3619  |  1.0556   |     4.1097     |   3.6622    |  23.0057  |
|                drq                |  1   | 0.2833  |  0.7285   |      nan       |   4.4646    |  17.2654  |
|              alexnet              | 128  | 0.2731  |  0.6766   |     1.9416     |   3.2376    |  17.0553  |
|               dcgan               |  32  | 0.2645  |  0.6215   |     1.8704     |   4.2904    |  12.3628  |
|      nvidia_deeprecommender       | 256  |  0.284  |  0.6579   |     0.9898     |   2.9939    |  10.0325  |
|         soft_actor_critic         | 256  | 0.2644  |  0.4764   |     0.7905     |   2.0864    |  8.4274   |
|           lennard_jones           | 1000 | 0.2383  |  0.5024   |     0.6872     |   1.5169    |  4.9677   |
|            tts_angular            |  64  | 0.3313  |  0.3901   |     0.5131     |   1.1311    |  2.5717   |
|              demucs               |  4   | 0.8855  |  0.8786   |     0.8858     |   0.8788    |  0.7821   |
|              yolov3               |  16  | 7.5276  |  15.1126  |      nan       |   44.979    |    nan    |
|           hf_GPT2_large           |  4   | 21.466  |  41.0854  |      nan       |     nan     |    nan    |
|        speech_transformer         |  32  | 7.4505  |  16.6613  |      nan       |     nan     |    nan    |
|               dlrm                | 2048 |   nan   |  1.1404   |      nan       |     nan     |    nan    |
|           hf_Longformer           |  0   |   nan   |    nan    |      nan       |     nan     |    nan    |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |    nan    |
|             tacotron2             |  0   |   nan   |    nan    |      nan       |     nan     |    nan    |
+-----------------------------------+------+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|             hf_Albert             |  8   | 0.9814 |   0.936   |      nan       |     nan     |  1.1576  |
|            Super_SloMo            |  6   | 1.0024 |  0.9697   |      nan       |     nan     |  1.1385  |
|            timm_nfnet             | 128  | 0.9761 |  0.9043   |      nan       |   0.9504    |  1.0242  |
|            tts_angular            |  64  | 1.0015 |  1.0015   |     0.9866     |   1.0015    |  0.9908  |
| attention_is_all_you_need_pytorch | 256  | 0.9976 |  0.9403   |      nan       |     nan     |  0.9875  |
|              demucs               |  4   | 0.987  |   0.987   |     0.987      |    0.987    |  0.987   |
|         timm_efficientdet         |  1   | 1.0316 |  0.8425   |      nan       |     nan     |  0.9858  |
|           BERT_pytorch            |  16  | 0.9991 |  0.8819   |      nan       |     nan     |  0.9728  |
|         timm_efficientnet         |  32  | 0.9982 |  0.7762   |      nan       |   0.7936    |  0.9689  |
|              hf_GPT2              |  4   | 0.971  |  0.8627   |      nan       |     nan     |  0.9645  |
|        Background_Matting         |  4   | 1.0196 |  0.9679   |      nan       |    0.987    |  0.9244  |
|           mobilenet_v2            |  96  | 1.0001 |  0.7725   |      nan       |   0.9235    |  0.8856  |
|           pytorch_unet            |  1   | 0.9968 |  0.8677   |      nan       |   0.8518    |  0.8681  |
|           fastNLP_Bert            |  6   | 1.0013 |  0.8966   |      nan       |     nan     |  0.8661  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.8751   |     0.2634     |   0.8432    |  0.8602  |
|            hf_T5_large            |  2   | 0.8541 |  0.8541   |      nan       |     nan     |  0.8535  |
|           hf_DistilBert           |  8   | 0.9505 |  0.8806   |      nan       |     nan     |  0.8387  |
|              hf_Bert              |  4   | 0.9844 |  0.8677   |      nan       |     nan     |  0.8383  |
|            timm_regnet            |  32  | 0.9999 |  0.8483   |      nan       |    0.85     |  0.8362  |
|              hf_Bart              |  4   | 0.9099 |  0.8321   |      nan       |     nan     |  0.8151  |
|            hf_BigBird             |  2   | 0.9852 |  0.9787   |      nan       |     nan     |   0.81   |
|            timm_vovnet            |  32  | 0.9903 |  0.7754   |      nan       |   0.7817    |  0.7861  |
|        shufflenet_v2_x1_0         | 128  | 1.0002 |   0.874   |      nan       |   0.8652    |  0.7812  |
|          pytorch_stargan          |  16  | 0.9929 |  0.9799   |     0.2149     |   0.8882    |  0.7783  |
|               dcgan               |  32  |  1.0   |  0.7949   |     0.343      |   0.7073    |  0.7527  |
|               vgg16               |  64  | 0.9998 |  0.7378   |     0.2978     |   0.7172    |  0.7491  |
|   timm_vision_transformer_large   |  8   | 0.9987 |  0.8366   |      nan       |   0.8491    |  0.7487  |
|              alexnet              | 128  | 1.0003 |  0.8082   |     0.4354     |    0.805    |  0.7352  |
|               hf_T5               |  8   | 0.9678 |  0.9371   |      nan       |     nan     |  0.7266  |
|           timm_resnest            |  32  | 0.9868 |  0.8809   |      nan       |   0.8726    |  0.722   |
|      timm_vision_transformer      |  8   | 1.0001 |  0.8868   |      nan       |   0.8871    |  0.7151  |
|             resnet50              |  32  | 1.0004 |  0.8678   |      nan       |   0.8041    |  0.6751  |
|            mnasnet1_0             |  32  | 0.9994 |  0.8793   |     0.173      |   0.8217    |  0.6596  |
|           squeezenet1_1           |  32  | 0.9604 |  0.7958   |     0.295      |   0.7589    |  0.6595  |
|        mobilenet_v3_large         |  32  | 0.999  |  0.8661   |      nan       |    0.874    |  0.6573  |
|          resnext50_32x4d          |  8   |  1.0   |  0.8591   |      nan       |    0.823    |  0.6514  |
|                drq                |  1   | 0.9125 |  0.8399   |      nan       |   0.8395    |  0.6406  |
|         soft_actor_critic         | 256  | 0.964  |  0.9151   |     0.4737     |   0.9151    |  0.6279  |
|          LearningToPaint          |  96  | 0.9252 |  0.7196   |      nan       |    0.71     |  0.605   |
|            densenet121            |  4   |  1.0   |  0.8696   |      nan       |   0.8376    |  0.574   |
|             resnet18              |  16  | 0.9782 |  0.7852   |      nan       |   0.7268    |  0.5644  |
|           lennard_jones           | 1000 |  1.0   |  1.0002   |     0.3735     |   1.0967    |  0.564   |
|      nvidia_deeprecommender       | 256  | 0.5596 |  0.5596   |     0.5262     |   0.5596    |  0.5596  |
|       functorch_dp_cifar10        |  64  | 0.9964 |  0.8131   |      nan       |    0.846    |  0.4465  |
|          pytorch_struct           | 200  |  1.0   |  0.5081   |     0.4858     |   0.5082    |  0.4235  |
|            hf_Reformer            |  4   | 0.3764 |  0.9993   |     0.2539     |     nan     |  0.3629  |
|              yolov3               |  16  | 1.0054 |  0.8488   |      nan       |   0.8244    |   nan    |
|        speech_transformer         |  32  | 1.0015 |  0.9177   |      nan       |     nan     |   nan    |
|           hf_GPT2_large           |  4   | 0.9586 |  0.8649   |      nan       |     nan     |   nan    |
|               dlrm                | 2048 |  nan   |  0.7282   |      nan       |     nan     |   nan    |
|           hf_Longformer           |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|             tacotron2             |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|           ElectraForCausalLM            | 1  | 1.0305 |   0.841   |      0.0       |     0.0     |  6.4479  |
|       MT5ForConditionalGeneration       | 2  | 1.0219 |  0.8629   |      0.0       |     0.0     |  5.9505  |
|          MobileBertForMaskedLM          | 16 | 1.0133 |   0.825   |      0.0       |     0.0     |  5.7095  |
|     MobileBertForQuestionAnswering      | 32 | 1.0128 |  0.8271   |      0.0       |     0.0     |  5.2263  |
|         MegatronBertForCausalLM         | 2  | 1.0371 |  0.8533   |      0.0       |     0.0     |  4.7686  |
|            YituTechConvBert             | 1  | 1.0204 |   0.843   |      0.0       |     0.0     |  4.4867  |
|             OPTForCausalLM              | 4  | 1.014  |  0.8312   |      0.0       |     0.0     |  4.4473  |
|                CamemBert                | 1  | 1.0396 |  0.8479   |      0.0       |     0.0     |  4.0705  |
|           RobertaForCausalLM            | 4  | 1.0385 |  0.8416   |      0.0       |     0.0     |  3.957   |
|     M2M100ForConditionalGeneration      | 2  | 1.0398 |  0.8205   |      0.0       |     0.0     |  3.8162  |
|     PegasusForConditionalGeneration     | 4  | 1.0104 |   0.828   |      0.0       |     0.0     |  3.2005  |
|    MegatronBertForQuestionAnswering     | 8  | 1.0348 |  0.8577   |      0.0       |     0.0     |  3.0658  |
|             XGLMForCausalLM             | 1  | 1.013  |  0.8145   |      0.0       |     0.0     |  3.0613  |
|               DistillGPT2               | 1  | 1.0299 |  0.8924   |      0.0       |     0.0     |  2.7432  |
|      MBartForConditionalGeneration      | 8  | 1.0143 |  0.8354   |      0.0       |     0.0     |  2.734   |
|     PLBartForConditionalGeneration      | 8  | 1.0174 |  0.8363   |      0.0       |     0.0     |  2.7186  |
|          DistilBertForMaskedLM          | 16 | 1.0288 |  0.8563   |      0.0       |     0.0     |  2.2064  |
|      GPT2ForSequenceClassification      | 4  | 0.9981 |  0.9775   |      0.0       |     0.0     |  2.159   |
|         Speech2Text2ForCausalLM         | 64 | 1.0037 |  0.8487   |      0.0       |     0.0     |  2.1102  |
|     DistilBertForQuestionAnswering      | 32 | 1.0327 |  0.8488   |      0.0       |     0.0     |  2.0944  |
|      BartForConditionalGeneration       | 1  | 1.0227 |  0.8343   |      0.0       |     0.0     |  2.0432  |
|       ElectraForQuestionAnswering       | 64 | 0.9986 |  0.9773   |      0.0       |     0.0     |  1.9714  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0134 |  0.8917   |      0.0       |     0.0     |  1.9272  |
|            TrOCRForCausalLM             | 8  | 1.0107 |  0.8345   |      0.0       |     0.0     |  1.855   |
|           PegasusForCausalLM            | 8  | 1.0087 |  0.8191   |      0.0       |     0.0     |  1.8131  |
|    LayoutLMForSequenceClassification    | 16 | 0.9978 |  0.9792   |      0.0       |     0.0     |  1.7478  |
|       T5ForConditionalGeneration        | 4  | 0.9988 |  0.9365   |      0.0       |     0.0     |  1.695   |
|       AlbertForQuestionAnswering        | 2  | 1.0007 |  0.8085   |      0.0       |     0.0     |  1.6688  |
|            AlbertForMaskedLM            | 2  | 1.0007 |  0.8083   |      0.0       |     0.0     |  1.6612  |
|            XLNetLMHeadModel             | 4  |  1.0   |   0.963   |      0.0       |     0.0     |  1.5949  |
|           LayoutLMForMaskedLM           | 16 | 0.9986 |   0.97    |      0.0       |     0.0     |  1.5936  |
|            PLBartForCausalLM            | 16 | 1.0129 |  0.9321   |      0.0       |     0.0     |  1.5853  |
|                 T5Small                 | 1  | 1.0296 |  0.8925   |      0.0       |     0.0     |  1.5687  |
|            MBartForCausalLM             | 16 | 1.0117 |  0.9215   |      0.0       |     0.0     |  1.5056  |
|       DebertaForQuestionAnswering       | 4  | 0.9363 |  0.7268   |     0.9281     |     0.0     |  1.494   |
|             BartForCausalLM             | 2  | 1.0021 |  0.9642   |      0.0       |     0.0     |  1.4697  |
|        BertForQuestionAnswering         | 64 | 0.9972 |  0.9687   |      0.0       |     0.0     |  1.4503  |
|       RobertaForQuestionAnswering       | 64 | 0.9979 |  0.9532   |      0.0       |     0.0     |  1.4467  |
|             BertForMaskedLM             | 64 | 0.9975 |  0.9547   |      0.0       |     0.0     |  1.3312  |
|       BlenderbotSmallForCausalLM        | 64 | 1.0008 |  0.9228   |      0.0       |     0.0     |  1.3039  |
|           DebertaForMaskedLM            | 4  | 0.9343 |  0.7295   |     0.7896     |     0.0     |  1.2209  |
|                 BigBird                 | 1  | 0.9916 |  0.9128   |      0.0       |     0.0     |  1.1516  |
|          AllenaiLongformerBase          | 0  |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+
|                  name                   | bs |    eager    |  aot_eager  | aot_cudagraphs | aot_nvfuser |  inductor   |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+
|            AlbertForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       AlbertForQuestionAnswering        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             BartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|      BartForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             BertForMaskedLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|        BertForQuestionAnswering         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                 BigBird                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                CamemBert                | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           DebertaForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|          DistilBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     DistilBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|               DistillGPT2               | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           ElectraForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       ElectraForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|      GPT2ForSequenceClassification      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           LayoutLMForMaskedLM           | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|    LayoutLMForSequenceClassification    | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     M2M100ForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            MBartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       MT5ForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|         MegatronBertForCausalLM         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|          MobileBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     MobileBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             OPTForCausalLM              | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            PLBartForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           PegasusForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     PegasusForConditionalGeneration     | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           RobertaForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       RobertaForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|         Speech2Text2ForCausalLM         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       T5ForConditionalGeneration        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                 T5Small                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            TrOCRForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             XGLMForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            XLNetLMHeadModel             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            YituTechConvBert             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       DebertaForQuestionAnswering       | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |
|      MBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |
|          AllenaiLongformerBase          | 1  | fail_to_run | fail_to_run |  fail_to_run   | fail_to_run | fail_to_run |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|                  name                   | bs |  eager   | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+
|            XLNetLMHeadModel             | 4  | 17.6179  |  39.6668  |      nan       |     nan     | 307.6319 |
|          MobileBertForMaskedLM          | 16 | 133.5692 | 170.8431  |      nan       |     nan     | 286.9824 |
|     MobileBertForQuestionAnswering      | 32 | 132.5387 | 170.3262  |      nan       |     nan     | 278.2839 |
|       T5ForConditionalGeneration        | 4  |  3.8127  |  12.0723  |      nan       |     nan     | 206.6256 |
|     M2M100ForConditionalGeneration      | 2  | 26.0986  |  44.545   |      nan       |     nan     | 195.2289 |
|       MT5ForConditionalGeneration       | 2  |  6.481   |  19.2831  |      nan       |     nan     | 190.4766 |
|            YituTechConvBert             | 1  |  9.2564  |  20.0505  |      nan       |     nan     | 180.8604 |
|      MBartForConditionalGeneration      | 8  | 26.6452  |  45.9743  |      nan       |     nan     | 161.8561 |
|     PegasusForConditionalGeneration     | 4  | 26.2002  |  43.7253  |      nan       |     nan     | 155.866  |
|           DebertaForMaskedLM            | 4  |  7.5448  |  14.5382  |    53.5019     |     nan     | 149.3218 |
|      BartForConditionalGeneration       | 1  | 26.2887  |  45.4355  |      nan       |     nan     | 145.9362 |
|             XGLMForCausalLM             | 1  | 15.2816  |  29.6837  |      nan       |     nan     | 145.1717 |
|                 T5Small                 | 1  |  3.8167  |  12.0291  |      nan       |     nan     | 136.6475 |
|    MegatronBertForQuestionAnswering     | 8  |  16.77   |  31.1625  |      nan       |     nan     | 135.7728 |
|         MegatronBertForCausalLM         | 2  | 17.4451  |  31.1547  |      nan       |     nan     | 133.7836 |
| BlenderbotSmallForConditionalGeneration | 32 | 12.2822  |  24.5287  |      nan       |     nan     | 119.2236 |
|     PLBartForConditionalGeneration      | 8  |  7.6989  |  17.1246  |      nan       |     nan     | 118.2111 |
|       DebertaForQuestionAnswering       | 4  |  7.3815  |  14.6553  |    53.7852     |     nan     | 116.1217 |
|           RobertaForCausalLM            | 4  |  5.3546  |  12.5031  |      nan       |     nan     | 94.3553  |
|    LayoutLMForSequenceClassification    | 16 |  5.5432  |  12.7514  |      nan       |     nan     | 87.7957  |
|           PegasusForCausalLM            | 8  | 10.1602  |  16.7714  |      nan       |     nan     | 85.5372  |
|       ElectraForQuestionAnswering       | 64 |  5.2363  |  12.2405  |      nan       |     nan     | 82.7418  |
|           LayoutLMForMaskedLM           | 16 |  5.6199  |  12.9982  |      nan       |     nan     | 78.3614  |
|             OPTForCausalLM              | 4  |  4.888   |  11.5661  |      nan       |     nan     | 78.2897  |
|            MBartForCausalLM             | 16 | 10.3936  |  16.9171  |      nan       |     nan     | 78.0137  |
|             BartForCausalLM             | 2  |  9.9759  |  16.5991  |      nan       |     nan     |  75.759  |
|      GPT2ForSequenceClassification      | 4  |  3.8248  |  10.0255  |      nan       |     nan     | 74.4963  |
|             BertForMaskedLM             | 64 |  5.1393  |  12.0977  |      nan       |     nan     | 72.9445  |
|           ElectraForCausalLM            | 1  |  5.3613  |  12.2367  |      nan       |     nan     | 66.8685  |
|            TrOCRForCausalLM             | 8  | 10.2076  |  16.6156  |      nan       |     nan     | 66.3959  |
|     DistilBertForQuestionAnswering      | 32 |  1.8752  |  5.4203   |      nan       |     nan     | 64.5461  |
|            AlbertForMaskedLM            | 2  |  1.5134  |  8.6449   |      nan       |     nan     | 63.0632  |
|                 BigBird                 | 1  | 11.4363  |  19.7217  |      nan       |     nan     | 61.6302  |
|                CamemBert                | 1  |  5.1796  |  12.282   |      nan       |     nan     | 61.2855  |
|       BlenderbotSmallForCausalLM        | 64 |  4.9225  |  9.5268   |      nan       |     nan     | 59.7408  |
|        BertForQuestionAnswering         | 64 |  5.0836  |  12.169   |      nan       |     nan     |  58.515  |
|            PLBartForCausalLM            | 16 |  3.2631  |   6.537   |      nan       |     nan     | 57.4481  |
|       RobertaForQuestionAnswering       | 64 |  5.133   |  12.1708  |      nan       |     nan     | 56.8695  |
|               DistillGPT2               | 1  |   1.58   |  4.6392   |      nan       |     nan     | 55.2511  |
|         Speech2Text2ForCausalLM         | 64 |  3.1898  |  6.7214   |      nan       |     nan     | 54.2535  |
|          DistilBertForMaskedLM          | 16 |  1.9262  |  5.4418   |      nan       |     nan     | 48.5942  |
|       AlbertForQuestionAnswering        | 2  |  1.6013  |  8.4355   |      nan       |     nan     | 41.7357  |
|          AllenaiLongformerBase          | 0  |   nan    |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------------+----+----------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|      GPT2ForSequenceClassification      | 4  | 0.9675 |  0.9163   |      nan       |     nan     |   1.07   |
|            XLNetLMHeadModel             | 4  | 0.9912 |  0.8791   |      nan       |     nan     |  1.0109  |
|       ElectraForQuestionAnswering       | 64 | 1.0016 |  0.9539   |      nan       |     nan     |  1.0002  |
|                 T5Small                 | 1  |  1.0   |  0.9124   |      nan       |     nan     |  0.9876  |
|           LayoutLMForMaskedLM           | 16 | 0.9999 |  0.9238   |      nan       |     nan     |  0.9871  |
|             BertForMaskedLM             | 64 | 0.9996 |   0.899   |      nan       |     nan     |  0.9811  |
|    LayoutLMForSequenceClassification    | 16 | 1.004  |  0.9325   |      nan       |     nan     |  0.9712  |
| BlenderbotSmallForConditionalGeneration | 32 | 0.9998 |  0.8996   |      nan       |     nan     |  0.9557  |
|             BartForCausalLM             | 2  |  1.0   |  0.8769   |      nan       |     nan     |  0.9545  |
|       T5ForConditionalGeneration        | 4  | 0.9996 |  0.9594   |      nan       |     nan     |  0.9525  |
|         Speech2Text2ForCausalLM         | 64 | 0.9954 |  0.8456   |      nan       |     nan     |  0.9452  |
|            PLBartForCausalLM            | 16 | 1.0006 |  0.8667   |      nan       |     nan     |  0.9395  |
|       BlenderbotSmallForCausalLM        | 64 | 0.9996 |  0.8172   |      nan       |     nan     |  0.9269  |
|        BertForQuestionAnswering         | 64 | 0.9995 |  0.9315   |      nan       |     nan     |  0.9256  |
|       RobertaForQuestionAnswering       | 64 | 0.9995 |  0.9315   |      nan       |     nan     |  0.9254  |
|          DistilBertForMaskedLM          | 16 | 0.9991 |  0.8698   |      nan       |     nan     |  0.9167  |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8619   |      nan       |     nan     |  0.881   |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.6451   |      nan       |     nan     |  0.8636  |
|            MBartForCausalLM             | 16 |  1.0   |  0.8398   |      nan       |     nan     |  0.8565  |
|            AlbertForMaskedLM            | 2  |  1.0   |  0.6364   |      nan       |     nan     |  0.8515  |
|                 BigBird                 | 1  | 1.0024 |  0.9555   |      nan       |     nan     |  0.8349  |
|     DistilBertForQuestionAnswering      | 32 | 0.9987 |  0.8967   |      nan       |     nan     |  0.8334  |
|     PLBartForConditionalGeneration      | 8  | 0.9999 |  0.8304   |      nan       |     nan     |  0.8252  |
|               DistillGPT2               | 1  | 1.0006 |  0.7548   |      nan       |     nan     |  0.812   |
|      MBartForConditionalGeneration      | 8  | 0.9999 |  0.8187   |      nan       |     nan     |  0.7699  |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.7955   |      nan       |     nan     |  0.7566  |
|                CamemBert                | 1  | 0.9989 |  0.7872   |      nan       |     nan     |  0.7482  |
|             OPTForCausalLM              | 4  | 0.9975 |  0.7501   |      nan       |     nan     |  0.7473  |
|            YituTechConvBert             | 1  | 0.9718 |  0.7819   |      nan       |     nan     |  0.7407  |
|           PegasusForCausalLM            | 8  | 0.999  |  0.9444   |      nan       |     nan     |  0.7324  |
|           RobertaForCausalLM            | 4  | 0.9237 |  0.7741   |      nan       |     nan     |  0.7309  |
|             XGLMForCausalLM             | 1  | 0.9999 |  0.9992   |      nan       |     nan     |  0.7214  |
|    MegatronBertForQuestionAnswering     | 8  | 0.9051 |  0.8218   |      nan       |     nan     |  0.7107  |
|          MobileBertForMaskedLM          | 16 | 0.9985 |  0.8983   |      nan       |     nan     |  0.6948  |
|     PegasusForConditionalGeneration     | 4  | 0.9996 |  0.9196   |      nan       |     nan     |  0.6769  |
|           ElectraForCausalLM            | 1  | 0.9993 |  0.8955   |      nan       |     nan     |  0.6701  |
|         MegatronBertForCausalLM         | 2  | 0.7726 |  0.7726   |      nan       |     nan     |  0.6697  |
|     M2M100ForConditionalGeneration      | 2  | 1.0046 |  0.9497   |      nan       |     nan     |  0.6614  |
|     MobileBertForQuestionAnswering      | 32 | 1.0142 |  0.9796   |      nan       |     nan     |  0.6265  |
|       MT5ForConditionalGeneration       | 2  | 0.6019 |  0.6019   |      nan       |     nan     |  0.6019  |
|           DebertaForMaskedLM            | 4  | 0.9982 |  0.9826   |     0.3599     |     nan     |  0.4498  |
|       DebertaForQuestionAnswering       | 4  | 0.979  |  1.0568   |     0.3578     |     nan     |  0.3761  |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|        res2net50_14w_8s         |  2  | 0.9998 |  0.9139   |      0.0       |   1.3802    |  5.4677  |
|            hrnet_w18            |  2  | 1.0017 |  0.9585   |      0.0       |    1.369    |  4.8554  |
|           res2next50            |  2  | 0.9994 |  0.9278   |      0.0       |    1.364    |  4.5511  |
|        twins_pcpvt_base         | 32  | 1.0033 |  0.8985   |      0.0       |   1.3637    |  2.5585  |
|          cait_m36_384           |  2  | 1.0002 |  0.8416   |      0.0       |   1.3573    |  2.3142  |
|      xcit_large_24_p8_224       |  5  | 1.0003 |    0.0    |      0.0       |     0.0     |  2.2223  |
|        tnt_s_patch16_224        | 64  | 0.9993 |  0.9911   |      0.0       |   1.8322    |  2.0033  |
|          ghostnet_100           | 128 | 1.0033 |  0.9993   |      0.0       |   1.5251    |  1.8714  |
|          gmixer_24_224          | 64  | 1.0003 |  0.8866   |     0.6404     |   1.0146    |  1.685   |
|           volo_d1_224           | 64  | 0.9994 |  0.9935   |      0.0       |   1.1513    |  1.6636  |
|         crossvit_9_240          | 64  | 1.0044 |  0.9607   |      0.0       |   1.1615    |  1.6279  |
|            nfnet_l0             | 64  | 1.0053 |   0.835   |      0.0       |   1.1326    |  1.6068  |
|            lcnet_050            | 128 | 0.9699 |   0.952   |      0.0       |   1.5557    |  1.5851  |
|  swin_base_patch4_window7_224   | 64  | 0.9993 |  0.9614   |      0.0       |   1.0645    |  1.5371  |
|         coat_lite_mini          | 128 | 0.9995 |  0.9957   |      0.0       |   1.2656    |   1.53   |
|           regnety_002           | 128 | 0.9783 |  0.9332   |      0.0       |   1.3828    |  1.5147  |
|          jx_nest_base           | 32  | 0.9989 |  0.9914   |      0.0       |   1.2388    |  1.4568  |
|          resmlp_12_224          | 128 | 1.0004 |   0.998   |     0.7817     |     0.0     |  1.4415  |
|          gmlp_s16_224           | 64  | 0.9991 |  0.9835   |      0.0       |   1.0521    |  1.4274  |
|           resnest101e           | 32  | 1.0057 |  0.9834   |      0.0       |   1.4266    |  1.408   |
|           convit_base           | 32  | 0.9992 |  0.9916   |      0.0       |     0.0     |  1.404   |
|            pit_b_224            | 64  | 0.9995 |  0.9942   |      0.0       |   1.0682    |  1.3627  |
|          mixer_b16_224          | 64  | 0.9994 |  0.9914   |     0.7158     |   0.9678    |  1.3053  |
| deit_base_distilled_patch16_224 | 64  | 0.9995 |  0.9912   |      0.0       |   1.0704    |  1.2869  |
|      beit_base_patch16_224      | 64  | 0.9997 |  0.9779   |      0.0       |   1.0498    |  1.2855  |
|           dm_nfnet_f0           | 128 | 0.998  |   0.998   |      0.0       |   1.1769    |  1.2806  |
|        adv_inception_v3         | 128 | 0.9999 |  0.9953   |      0.0       |   1.1943    |  1.2302  |
|       gluon_inception_v3        | 128 | 0.9998 |  0.9949   |      0.0       |   1.1948    |  1.2192  |
|         poolformer_m36          | 64  | 0.9994 |  0.9979   |      0.0       |     0.0     |  1.2154  |
|      vit_base_patch16_224       | 64  | 0.9996 |  0.9932   |      0.0       |   0.9997    |  1.1986  |
|          inception_v3           | 128 |  1.0   |  0.9909   |      0.0       |   1.1951    |  1.161   |
|            mixnet_l             | 64  | 0.9791 |  0.8891   |      0.0       |   1.0717    |  1.0913  |
|         visformer_small         | 128 | 1.0003 |  1.0008   |      0.0       |   1.0865    |  1.0813  |
|           mobilevit_s           | 32  | 0.9725 |  0.7991   |      0.0       |    1.214    |  1.0725  |
|           tf_mixnet_l           | 64  | 0.9815 |  0.8994   |      0.0       |   1.0653    |  1.0629  |
|          pnasnet5large          | 16  | 1.0049 |  1.0324   |      0.0       |   1.1288    |  1.0268  |
|             dla102              | 64  | 0.9989 |  0.9909   |      0.0       |   1.3787    |  1.0037  |
|           mnasnet_100           | 128 | 0.9532 |  0.9444   |     0.6661     |   1.3675    |  0.9873  |
|            fbnetv3_b            | 128 | 0.9531 |  0.9403   |      0.0       |   1.2576    |  0.984   |
|      mobilenetv3_large_100      | 128 | 0.9544 |  0.9645   |      0.0       |   1.3458    |  0.9124  |
|          cspdarknet53           | 64  | 0.9431 |  0.9337   |      0.0       |   0.9007    |  0.9021  |
|        convmixer_768_32         | 32  | 0.9997 |  0.9978   |      0.0       |   1.0526    |  0.8983  |
|             dpn107              | 32  | 0.9318 |  0.9269   |      0.0       |   0.9758    |  0.8969  |
|          spnasnet_100           | 128 | 0.9461 |  0.9371   |     0.6562     |   1.3164    |  0.8965  |
|           selecsls42b           | 128 | 0.9998 |  0.9942   |      0.0       |   1.3571    |  0.8944  |
|        res2net101_26w_4s        | 64  | 1.0012 |  0.9975   |      0.0       |   1.3978    |  0.8776  |
|         mobilenetv2_100         | 128 | 0.9506 |  0.9419   |      0.0       |   0.8663    |  0.8611  |
|            tinynet_a            | 128 | 0.9574 |  0.7997   |      0.0       |    1.076    |  0.8526  |
|            repvgg_a2            | 128 | 0.9414 |  0.9341   |     0.6584     |   1.1312    |  0.8335  |
|       tf_efficientnet_b0        | 128 | 0.9644 |  0.8033   |      0.0       |    1.095    |  0.825   |
|            gernet_l             | 128 | 0.9448 |  0.9369   |      0.0       |   1.1404    |  0.7845  |
|           fbnetc_100            | 128 | 0.952  |  0.9417   |     0.674      |   1.3769    |  0.7503  |
|          convnext_base          | 32  | 1.0049 |  0.9333   |      0.0       |   1.2117    |  0.7498  |
|       eca_botnext26ts_256       | 64  | 0.9626 |  0.8006   |      0.0       |   1.1068    |  0.742   |
|        ese_vovnet19b_dw         | 128 | 0.9688 |  0.9647   |      0.0       |   1.2439    |  0.7183  |
|        sebotnet33ts_256         | 64  | 0.9661 |  0.8366   |      0.0       |   1.1168    |  0.7063  |
|        eca_halonext26ts         | 64  | 0.9633 |  0.8053   |      0.0       |   1.1018    |  0.7011  |
|           rexnet_100            | 128 | 0.9637 |  0.8494   |      0.0       |   1.0369    |  0.6808  |
|          botnet26t_256          | 128 | 0.9788 |  0.9746   |      0.0       |   1.3462    |  0.6385  |
|     swsl_resnext101_32x16d      | 32  | 0.9986 |   0.98    |      0.0       |   1.0749    |  0.6356  |
|        gluon_xception65         | 32  | 0.9988 |  0.9869   |      0.0       |   1.0631    |  0.6086  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|        adv_inception_v3         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          botnet26t_256          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        convmixer_768_32         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          cspdarknet53           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dla102              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dpn107              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        eca_halonext26ts         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            gernet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          ghostnet_100           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       gluon_inception_v3        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          inception_v3           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            lcnet_050            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            mixnet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         mobilenetv2_100         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           mobilevit_s           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            nfnet_l0             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          pnasnet5large          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           regnety_002           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net101_26w_4s        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net50_14w_8s         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           res2next50            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           rexnet_100            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           selecsls42b           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           tf_mixnet_l           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            tinynet_a            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         visformer_small         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      vit_base_patch16_224       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |
|      xcit_large_24_p8_224       | 2  | pass  |  fail_to_run  |  fail_to_run   |  fail_to_run  |     pass      |
|          gmixer_24_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|          gmlp_s16_224           | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|         poolformer_m36          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|           resnest101e           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|         coat_lite_mini          | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            pit_b_224            | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        gluon_xception65         | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|            hrnet_w18            | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |
|            fbnetv3_b            | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy | fail_accuracy |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|              name               | bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+
|            hrnet_w18            |  2  | 99.0979 | 139.3754  |      nan       |  463.9864   | 1335.9117 |
|             dpn107              | 32  | 13.787  |  27.9523  |      nan       |  110.6804   | 1217.6839 |
|          pnasnet5large          | 16  | 60.3829 |  87.8325  |      nan       |  248.5446   | 1163.1213 |
|           rexnet_100            | 128 | 6.6118  |  13.7642  |      nan       |  121.3675   | 963.7513  |
|        res2net50_14w_8s         |  2  | 20.2395 |  37.6106  |      nan       |  122.0419   | 928.2062  |
|           mobilevit_s           | 32  | 5.9861  |  13.2909  |      nan       |   61.2292   | 866.3209  |
|       eca_botnext26ts_256       | 64  | 2.5743  |  6.9839   |      nan       |   63.7844   | 795.1044  |
|        twins_pcpvt_base         | 32  | 26.1399 |  43.1348  |      nan       |   95.5654   | 757.3713  |
|            mixnet_l             | 64  | 13.3341 |  22.5842  |      nan       |   87.9736   | 755.6059  |
|          ghostnet_100           | 128 | 9.3384  |  18.4048  |      nan       |   96.6532   | 682.6303  |
|            tinynet_a            | 128 | 7.7081  |  15.1645  |      nan       |   84.1684   | 670.1545  |
|           resnest101e           | 32  | 26.1437 |  45.7787  |      nan       |  123.6997   | 629.5917  |
|            fbnetv3_b            | 128 | 12.9847 |  23.2924  |      nan       |  109.3657   | 628.1691  |
|         coat_lite_mini          | 128 |  3.237  |  8.7565   |      nan       |   34.2442   | 591.0094  |
|           fbnetc_100            | 128 |  5.694  |  12.217   |    84.0962     |   63.2725   | 561.7668  |
|        sebotnet33ts_256         | 64  | 3.9547  |  9.6054   |      nan       |   69.5973   | 553.5012  |
|          botnet26t_256          | 128 | 2.4441  |  6.5802   |      nan       |   50.3226   |  525.648  |
|        eca_halonext26ts         | 64  | 2.7144  |  7.2299   |      nan       |   66.9848   | 482.7002  |
|           res2next50            |  2  | 7.4752  |  16.7251  |      nan       |   64.4957   |  479.389  |
|             dla102              | 64  | 10.7026 |  22.4382  |      nan       |   96.1266   | 471.0122  |
|           tf_mixnet_l           | 64  | 13.961  |  22.9985  |      nan       |    88.72    | 464.9684  |
|          cspdarknet53           | 64  | 6.2238  |  13.3097  |      nan       |   44.6251   | 462.2021  |
|        res2net101_26w_4s        | 64  | 25.7344 |  45.7234  |      nan       |   142.637   | 395.0089  |
|           mnasnet_100           | 128 | 4.1954  |  9.3198   |     61.348     |   52.9114   | 388.1856  |
|        adv_inception_v3         | 128 |  8.598  |  18.5391  |      nan       |  104.0052   | 385.8574  |
|       tf_efficientnet_b0        | 128 | 5.9727  |  12.614   |      nan       |   81.4109   | 382.4551  |
|            nfnet_l0             | 64  |  6.072  |  12.7626  |      nan       |   38.3849   | 361.8579  |
|  swin_base_patch4_window7_224   | 64  | 12.1335 |  25.372   |      nan       |   80.6123   | 361.6558  |
|          convnext_base          | 32  | 11.6385 |  18.8718  |      nan       |   45.8694   | 359.4631  |
|           regnety_002           | 128 | 5.0311  |  10.6266  |      nan       |    59.42    |  353.906  |
|         mobilenetv2_100         | 128 | 4.2209  |  9.0704   |      nan       |   42.6452   | 347.8093  |
|         visformer_small         | 128 | 2.3983  |  6.3456   |      nan       |   31.6313   | 331.8432  |
|      xcit_large_24_p8_224       |  5  | 37.759  |    nan    |      nan       |     nan     | 328.7567  |
|        ese_vovnet19b_dw         | 128 | 2.1377  |  4.9109   |      nan       |   39.6491   | 324.5284  |
|      mobilenetv3_large_100      | 128 |  4.528  |  10.5561  |      nan       |   84.4514   | 305.6073  |
|        gluon_xception65         | 32  | 15.8294 |  28.8359  |      nan       |   77.5442   | 299.1182  |
|          jx_nest_base           | 32  | 9.7064  |  19.4019  |      nan       |   58.4165   | 292.2664  |
|          cait_m36_384           |  2  | 47.2806 |  71.1456  |      nan       |  106.2065   | 290.8927  |
|         crossvit_9_240          | 64  | 7.6978  |  16.8783  |      nan       |   42.5032   | 282.5118  |
|         poolformer_m36          | 64  | 13.2782 |  20.9495  |      nan       |     nan     | 266.8697  |
|            gernet_l             | 128 | 4.9148  |  10.906   |      nan       |   47.155    | 236.2008  |
|           selecsls42b           | 128 | 2.4352  |  7.0271   |      nan       |   51.9936   |  234.33   |
|            lcnet_050            | 128 | 2.0668  |  5.0306   |      nan       |   39.1151   | 218.1046  |
|          spnasnet_100           | 128 | 5.5499  |  11.8294  |    82.3631     |   61.1525   | 198.4862  |
|     swsl_resnext101_32x16d      | 32  | 10.4299 |  21.7146  |      nan       |   62.1379   | 191.4235  |
|           volo_d1_224           | 64  | 6.6938  |  14.7397  |      nan       |   43.8981   | 188.4197  |
|          inception_v3           | 128 | 8.5993  |  18.9598  |      nan       |  105.0869   | 188.3471  |
|       gluon_inception_v3        | 128 | 8.6691  |  18.4348  |      nan       |  104.8996   | 181.4554  |
|        tnt_s_patch16_224        | 64  | 12.2527 |  24.6088  |      nan       |   48.277    | 168.2284  |
|           convit_base           | 32  | 4.0121  |  10.3663  |      nan       |     nan     |  161.784  |
|            pit_b_224            | 64  | 3.8475  |  9.3151   |      nan       |   27.1769   | 161.3686  |
|          gmlp_s16_224           | 64  | 9.5917  |  17.1302  |      nan       |   30.0903   | 137.9361  |
|          gmixer_24_224          | 64  | 8.7309  |  17.4092  |    61.7159     |   34.1668   | 127.9428  |
|            repvgg_a2            | 128 | 4.8452  |  10.4251  |    52.4543     |   64.5302   |  117.378  |
|           dm_nfnet_f0           | 128 | 6.5598  |  13.2818  |      nan       |   41.8458   | 115.8439  |
|          resmlp_12_224          | 128 | 2.8258  |  5.9335   |     9.778      |     nan     |  86.2363  |
|          mixer_b16_224          | 64  | 3.0313  |  6.8812   |    16.4068     |   18.0477   |  85.4161  |
| deit_base_distilled_patch16_224 | 64  | 3.1395  |  7.9059   |      nan       |   16.7699   |  75.4263  |
|      beit_base_patch16_224      | 64  | 4.7743  |  10.3559  |      nan       |   21.286    |  74.7182  |
|        convmixer_768_32         | 32  | 7.1496  |  14.2997  |      nan       |   23.4112   |  73.5333  |
|      vit_base_patch16_224       | 64  | 3.0296  |  7.7881   |      nan       |   16.2793   |  59.3368  |
+---------------------------------+-----+---------+-----------+----------------+-------------+-----------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|          gmixer_24_224          | 64  | 1.0008 |  0.9563   |     0.2215     |   0.8991    |  1.2587  |
|          gmlp_s16_224           | 64  |  1.0   |  0.9679   |      nan       |    0.92     |  1.2405  |
|            tinynet_a            | 128 | 1.0001 |  0.7955   |      nan       |   0.7958    |  1.1632  |
|          pnasnet5large          | 16  | 1.0583 |  0.9923   |      nan       |   1.1741    |  1.1265  |
|        eca_halonext26ts         | 64  | 0.9885 |  0.7814   |      nan       |    0.786    |  1.0888  |
|           dm_nfnet_f0           | 128 | 0.9758 |  0.9039   |      nan       |    0.95     |  1.0614  |
|        tnt_s_patch16_224        | 64  |  1.0   |  0.9718   |      nan       |   0.9431    |  1.0587  |
|           volo_d1_224           | 64  | 1.0015 |  0.9518   |      nan       |   0.8587    |  1.0378  |
|           convit_base           | 32  | 0.9991 |   0.86    |      nan       |     nan     |  1.0309  |
|      beit_base_patch16_224      | 64  | 0.9999 |  0.9367   |      nan       |   0.9298    |  1.0097  |
|           mobilevit_s           | 32  |  1.0   |  0.7722   |      nan       |    0.787    |  1.0078  |
|           rexnet_100            | 128 | 0.9988 |  0.7919   |      nan       |   0.8648    |  1.001   |
|             dla102              | 64  | 0.9998 |  0.9549   |      nan       |   0.9751    |  0.9969  |
|            pit_b_224            | 64  | 1.0021 |  0.8074   |      nan       |   0.8179    |  0.9856  |
|         poolformer_m36          | 64  | 1.0015 |  0.9462   |      nan       |     nan     |  0.9797  |
|          convnext_base          | 32  | 1.0065 |   0.908   |      nan       |   0.7521    |  0.9564  |
|        twins_pcpvt_base         | 32  | 0.9963 |  0.9079   |      nan       |   0.8007    |  0.9553  |
|        convmixer_768_32         | 32  | 0.9992 |  0.9807   |      nan       |   0.9715    |  0.9513  |
|         visformer_small         | 128 | 0.9899 |  0.9353   |      nan       |   0.8884    |  0.9342  |
|           resnest101e           | 32  | 1.0002 |  0.9762   |      nan       |   0.9535    |  0.9292  |
|           tf_mixnet_l           | 64  | 0.9995 |  0.8624   |      nan       |   0.8426    |  0.9291  |
|          mixer_b16_224          | 64  | 0.9929 |  0.9425   |     0.2532     |   0.7726    |  0.9225  |
|       tf_efficientnet_b0        | 128 | 1.0006 |  0.7769   |      nan       |    0.846    |  0.919   |
|            nfnet_l0             | 64  | 0.9993 |   0.824   |      nan       |   0.8257    |  0.913   |
|         mobilenetv2_100         | 128 | 0.9992 |  0.7716   |      nan       |   0.9249    |  0.8963  |
|      vit_base_patch16_224       | 64  | 0.9955 |  0.9384   |      nan       |   0.8801    |  0.8916  |
| deit_base_distilled_patch16_224 | 64  | 0.9944 |  0.9376   |      nan       |   0.8794    |  0.8911  |
|      mobilenetv3_large_100      | 128 | 0.9987 |  0.8562   |      nan       |   0.8673    |  0.8886  |
|        adv_inception_v3         | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|       gluon_inception_v3        | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|          inception_v3           | 128 | 1.0003 |  0.8759   |      nan       |   0.8538    |  0.8829  |
|        gluon_xception65         | 32  |  1.0   |  0.8895   |      nan       |   0.8854    |  0.8713  |
|             dpn107              | 32  | 0.9981 |  0.9115   |      nan       |   0.8834    |   0.87   |
|           selecsls42b           | 128 | 0.9789 |  0.8913   |      nan       |   0.8811    |  0.8659  |
|            fbnetv3_b            | 128 | 1.0003 |  0.7918   |      nan       |   0.7903    |  0.8647  |
|            mixnet_l             | 64  | 0.9989 |  0.8507   |      nan       |   0.7796    |  0.8601  |
|          spnasnet_100           | 128 | 0.9988 |  0.8961   |     0.1651     |   0.8371    |  0.8599  |
|       eca_botnext26ts_256       | 64  | 0.9998 |  0.7776   |      nan       |   0.7811    |  0.8534  |
|     swsl_resnext101_32x16d      | 32  | 1.0009 |  0.8805   |      nan       |   0.8487    |  0.8523  |
|      xcit_large_24_p8_224       |  5  | 0.9987 |    nan    |      nan       |     nan     |  0.8489  |
|          resmlp_12_224          | 128 | 0.9827 |  0.9667   |     0.2637     |     nan     |  0.845   |
|          ghostnet_100           | 128 | 1.0013 |  0.8903   |      nan       |   0.9244    |  0.8329  |
|         coat_lite_mini          | 128 | 1.0338 |   0.929   |      nan       |   0.6593    |  0.8328  |
|        ese_vovnet19b_dw         | 128 |  1.0   |   0.867   |      nan       |   0.9146    |  0.8269  |
|          cspdarknet53           | 64  |  1.0   |  0.8467   |      nan       |   0.7906    |  0.813   |
|          cait_m36_384           |  2  | 0.9998 |  0.8806   |      nan       |   0.9023    |  0.8081  |
|          jx_nest_base           | 32  |  1.0   |  0.8945   |      nan       |    0.86     |   0.8    |
|         crossvit_9_240          | 64  | 1.0008 |  0.8801   |      nan       |   0.8854    |  0.7934  |
|        res2net101_26w_4s        | 64  | 0.9999 |  0.9202   |      nan       |   0.8569    |  0.7834  |
|           mnasnet_100           | 128 | 0.9993 |  0.8882   |     0.1669     |   0.8253    |  0.773   |
|  swin_base_patch4_window7_224   | 64  | 0.9998 |  0.9234   |      nan       |   0.8451    |  0.7676  |
|        sebotnet33ts_256         | 64  | 0.9999 |  0.7108   |      nan       |   0.7354    |  0.7449  |
|            gernet_l             | 128 | 0.9998 |  0.8655   |      nan       |   0.8299    |  0.7238  |
|           fbnetc_100            | 128 | 0.9984 |  0.8631   |     0.1626     |   0.7352    |  0.7104  |
|            lcnet_050            | 128 | 0.9992 |  0.7927   |      nan       |   0.7885    |  0.705   |
|           regnety_002           | 128 | 0.9994 |  0.8284   |      nan       |   0.7819    |  0.6975  |
|          botnet26t_256          | 128 |  1.0   |  0.8755   |      nan       |    0.78     |  0.6615  |
|           res2next50            |  2  |  1.0   |  0.8301   |      nan       |   0.8198    |  0.6012  |
|        res2net50_14w_8s         |  2  |  1.0   |  0.8275   |      nan       |   0.8169    |  0.5927  |
|            hrnet_w18            |  2  |  1.0   |  0.8383   |      nan       |   0.8363    |  0.5746  |
|            repvgg_a2            | 128 | 1.0003 |  0.7971   |     0.1444     |   0.6902    |  0.5572  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      | 89%, 49/55 | 98%, 43/44  | 100%, 61/61 |
|   aot_eager    | 87%, 48/55 | 98%, 43/44  | 90%, 55/61  |
| aot_cudagraphs | 25%, 14/55 |  0%, 0/44   |  2%, 1/61   |
|  aot_nvfuser   | 58%, 32/55 |  2%, 1/44   | 82%, 50/61  |
|    inductor    | 84%, 46/55 | 93%, 41/44  | 97%, 59/61  |
+----------------+------------+-------------+-------------+

Geometric mean speedup

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   1.00x    |    1.01x    |    1.00x    |
|   aot_eager    |   1.01x    |    1.00x    |    1.00x    |
| aot_cudagraphs |   1.03x    |    0.0x     |    1.00x    |
|  aot_nvfuser   |   1.13x    |    1.11x    |    1.12x    |
|    inductor    |   1.49x    |    1.64x    |    1.35x    |
+----------------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |    1.76    |    2.14     |    2.02     |
|   aot_eager    |    6.40    |    9.10     |    8.87     |
| aot_cudagraphs |    4.48    |     0.0     |    5.79     |
|  aot_nvfuser   |   20.37    |    9.44     |    49.20    |
|    inductor    |   131.38   |   102.01    |   213.14    |
+----------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   0.96x    |    0.98x    |    0.99x    |
|   aot_eager    |   0.86x    |    0.89x    |    0.87x    |
| aot_cudagraphs |   0.41x    |    0.0x     |    0.25x    |
|  aot_nvfuser   |   0.83x    |    1.08x    |    0.84x    |
|    inductor    |   0.84x    |    0.77x    |    0.95x    |
+----------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            densenet121            |  4   | 1.0022 |  1.0056   |      0.0       |   1.4544    |  5.2009  |
|         timm_efficientdet         |  1   | 0.9838 |  0.8876   |      0.0       |     0.0     |  4.2126  |
|       functorch_dp_cifar10        |  64  | 0.999  |  0.9933   |      0.0       |   1.1961    |  3.6586  |
|      timm_vision_transformer      |  8   | 1.0053 |  0.9219   |      0.0       |   1.3393    |  2.7292  |
|                drq                |  1   | 1.0073 |  0.8137   |      0.0       |   1.0725    |  2.4432  |
|          resnext50_32x4d          |  8   | 1.0033 |   1.053   |      0.0       |   1.3317    |  2.1015  |
|        mobilenet_v3_large         |  32  | 1.0055 |  1.1184   |      0.0       |   1.3849    |  2.079   |
|           BERT_pytorch            |  16  | 1.0097 |  0.8826   |      0.0       |     0.0     |  1.8631  |
|          pytorch_struct           | 200  | 1.008  |   0.736   |     0.8823     |   0.9846    |  1.8608  |
|             resnet18              |  16  | 1.0058 |  1.1215   |      0.0       |   1.4023    |  1.8249  |
|           lennard_jones           | 1000 | 0.9697 |  0.8417   |     1.0649     |   1.0102    |  1.7782  |
|           squeezenet1_1           |  32  | 1.0032 |    1.0    |     0.9697     |   1.1615    |  1.753   |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.999  |  0.9576   |     1.3201     |   1.2037    |  1.7521  |
|             hf_Albert             |  8   | 1.0012 |  0.9956   |      0.0       |     0.0     |  1.6573  |
|               dcgan               |  32  | 0.9445 |  1.0147   |     1.0422     |   1.1791    |  1.6502  |
|        shufflenet_v2_x1_0         | 128  | 1.0003 |  1.0852   |      0.0       |   1.1981    |  1.6387  |
|            hf_T5_large            |  2   | 1.0258 |  0.8969   |      0.0       |     0.0     |  1.5504  |
|           timm_resnest            |  32  | 0.9998 |  1.0033   |      0.0       |   1.1847    |  1.5201  |
|            timm_nfnet             | 128  | 0.9994 |  0.9996   |      0.0       |   1.2125    |  1.4753  |
|            mnasnet1_0             |  32  | 1.0002 |  1.1048   |     0.7027     |   1.2986    |  1.4659  |
|              hf_GPT2              |  4   |  1.01  |  0.9778   |      0.0       |     0.0     |  1.4239  |
|           mobilenet_v2            |  96  | 0.9997 |  0.9998   |      0.0       |   1.0391    |  1.4232  |
|           fastNLP_Bert            |  6   | 0.999  |  0.9712   |      0.0       |     0.0     |  1.3718  |
|         soft_actor_critic         | 256  | 0.973  |  0.8119   |     1.0368     |   0.9557    |  1.3261  |
|         timm_efficientnet         |  32  | 0.9563 |  0.8181   |      0.0       |   1.0657    |  1.3067  |
|          LearningToPaint          |  96  | 1.0052 |  1.0615   |      0.0       |   1.2309    |  1.3053  |
|             resnet50              |  32  | 0.9992 |   0.994   |      0.0       |   1.1627    |  1.205   |
|           pytorch_unet            |  1   | 0.9998 |  0.9988   |      0.0       |   1.0768    |   1.2    |
|            Super_SloMo            |  6   |  1.0   |  0.9969   |      0.0       |     0.0     |  1.1814  |
|              hf_Bart              |  4   | 1.013  |   0.974   |      0.0       |     0.0     |  1.1754  |
|               vgg16               |  64  | 0.9999 |   0.999   |     0.792      |    0.997    |  1.1709  |
|              alexnet              | 128  | 0.9993 |  0.9979   |     0.7777     |   1.0008    |  1.163   |
|              hf_Bert              |  4   | 1.0321 |  0.9366   |      0.0       |     0.0     |  1.1622  |
|           hf_DistilBert           |  8   | 1.0005 |  0.9564   |      0.0       |     0.0     |  1.1519  |
|            timm_regnet            |  32  | 0.9662 |  0.9642   |      0.0       |   1.0959    |  1.1341  |
|          pytorch_stargan          |  16  | 0.9984 |  0.9809   |     0.7288     |    0.987    |  1.1213  |
|        Background_Matting         |  4   | 1.0005 |  1.0217   |      0.0       |    1.082    |  1.1141  |
|            hf_Reformer            |  4   | 0.9961 |    0.0    |     0.8941     |     0.0     |  1.1097  |
|            hf_BigBird             |  2   | 0.9912 |  0.9483   |      0.0       |     0.0     |  1.0763  |
|              yolov3               |  16  |  1.0   |   0.995   |      0.0       |   1.1849    |  1.0746  |
|   timm_vision_transformer_large   |  8   | 1.0013 |  0.9943   |      0.0       |   0.9823    |  1.0532  |
| attention_is_all_you_need_pytorch | 256  | 0.9999 |  0.9719   |      0.0       |     0.0     |  1.0425  |
|            tts_angular            |  64  | 0.9871 |  0.9542   |     0.9837     |    1.007    |  1.0047  |
|              demucs               |  4   |  1.0   |  1.0002   |     0.9991     |   1.0001    |  1.0003  |
|            timm_vovnet            |  32  | 0.9126 |  0.9036   |      0.0       |   0.9786    |  0.9892  |
|      nvidia_deeprecommender       | 256  | 0.9995 |  0.9633   |     0.5844     |   0.9422    |  0.9046  |
|               hf_T5               |  8   | 1.0009 |  0.9918   |      0.0       |     0.0     |   0.0    |
|           hf_GPT2_large           |  4   | 1.0004 |  0.9804   |      0.0       |     0.0     |   0.0    |
|        speech_transformer         |  32  | 1.0032 |  0.9015   |      0.0       |     0.0     |   0.0    |
|               dlrm                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|           hf_Longformer           |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|    mobilenet_v2_quantized_qat     |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|      resnet50_quantized_qat       |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|             tacotron2             |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|        Background_Matting         |  4  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          LearningToPaint          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            densenet121            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|                drq                |  1  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           mobilenet_v2            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           pytorch_unet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet18              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet50              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          resnext50_32x4d          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|         timm_efficientnet         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_regnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           timm_resnest            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|      timm_vision_transformer      |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_vovnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            Super_SloMo            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           fastNLP_Bert            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|             hf_Albert             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bert              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_BigBird             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_DistilBert           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              yolov3               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|           hf_Longformer           |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|    mobilenet_v2_quantized_qat     |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|      resnet50_quantized_qat       |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|             tacotron2             |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+----------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+
|              yolov3               |  16  |  2.852  |  8.8025   |      nan       |   43.2264   | 834.5667 |
|         timm_efficientdet         |  1   | 19.6005 |  38.0517  |      nan       |     nan     | 780.0168 |
|            hf_T5_large            |  2   | 12.4163 |  41.2037  |      nan       |     nan     | 519.4725 |
|            densenet121            |  4   | 2.1945  |  13.6586  |      nan       |   89.2202   | 307.282  |
| attention_is_all_you_need_pytorch | 256  | 1.0811  |  7.3219   |      nan       |     nan     | 253.749  |
|           timm_resnest            |  32  | 0.5654  |  2.7325   |      nan       |   35.0693   | 226.5594 |
|      timm_vision_transformer      |  8   | 0.7318  |  4.1023   |      nan       |   9.0512    | 184.8576 |
|        mobilenet_v3_large         |  32  | 0.8745  |  4.9777   |      nan       |   53.5229   | 164.2159 |
|           BERT_pytorch            |  16  | 1.4586  |  7.4362   |      nan       |     nan     | 161.1107 |
|   timm_vision_transformer_large   |  8   | 2.1743  |  13.6492  |      nan       |   24.1662   | 151.6599 |
|         timm_efficientnet         |  32  | 1.7009  |  6.7188   |      nan       |   52.4848   | 136.5229 |
|           fastNLP_Bert            |  6   | 1.3882  |  6.6513   |      nan       |     nan     | 136.4023 |
|          pytorch_stargan          |  16  | 0.4024  |  2.4373   |     9.2403     |   4.0267    | 135.5749 |
|           mobilenet_v2            |  96  | 0.7828  |  4.6697   |      nan       |   37.2951   | 133.8033 |
|              hf_Bart              |  4   | 1.3508  |  7.9191   |      nan       |     nan     | 131.1878 |
|              hf_GPT2              |  4   | 1.2194  |  5.9845   |      nan       |     nan     | 128.688  |
|            mnasnet1_0             |  32  | 0.8055  |  4.7542   |    21.0664     |   31.2578   | 125.661  |
|          pytorch_struct           | 200  | 0.2357  |  0.7743   |     1.5365     |   4.1118    | 121.4775 |
|        shufflenet_v2_x1_0         | 128  | 0.9448  |   5.95    |      nan       |   27.0319   | 104.2267 |
|            timm_nfnet             | 128  | 1.8191  |   7.415   |      nan       |   29.537    | 103.7278 |
|          resnext50_32x4d          |  8   | 0.8908  |  5.2354   |      nan       |   29.0877   | 98.4197  |
|            timm_vovnet            |  32  |  1.512  |  4.7245   |      nan       |   23.8082   | 90.9277  |
|            timm_regnet            |  32  | 2.2296  |  8.5346   |      nan       |   47.2423   | 82.1711  |
|            Super_SloMo            |  6   | 0.9402  |  4.9556   |      nan       |     nan     | 81.8682  |
|             hf_Albert             |  8   | 0.9341  |  5.5692   |      nan       |     nan     | 73.1288  |
|             resnet50              |  32  | 0.8693  |  5.1374   |      nan       |   32.1725   | 71.2104  |
|              hf_Bert              |  4   | 1.3528  |  6.0969   |      nan       |     nan     | 67.6511  |
|            hf_Reformer            |  4   | 2.3118  |    nan    |    12.8556     |     nan     | 65.6298  |
|       functorch_dp_cifar10        |  64  | 0.3519  |  1.9863   |      nan       |   5.5953    |  65.244  |
|        Background_Matting         |  4   | 0.7269  |  4.6671   |      nan       |   29.3082   | 64.9394  |
|             resnet18              |  16  | 0.4316  |  1.9408   |      nan       |   17.7099   | 63.7446  |
|           pytorch_unet            |  1   | 0.4395  |  2.1424   |      nan       |   19.6082   |  63.716  |
|          LearningToPaint          |  96  |  0.432  |  2.0613   |      nan       |   23.8935   | 56.4957  |
|            hf_BigBird             |  2   | 7.1325  |  13.2837  |      nan       |     nan     | 55.0936  |
|           hf_DistilBert           |  8   | 0.4251  |  2.8964   |      nan       |     nan     | 47.9177  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.3797  |  2.3411   |     8.012      |   3.9035    | 31.3399  |
|           squeezenet1_1           |  32  | 0.2311  |   0.939   |     2.6246     |   4.5054    | 30.4021  |
|               vgg16               |  64  | 0.1849  |  0.6506   |     2.1837     |   2.4324    | 18.7241  |
|              alexnet              | 128  | 0.1545  |  0.4068   |     1.3655     |   2.4079    | 14.8785  |
|                drq                |  1   | 0.1343  |  0.4544   |      nan       |   3.3517    | 14.8192  |
|               dcgan               |  32  | 0.1845  |  0.4461   |     1.3915     |   3.7557    | 14.3419  |
|      nvidia_deeprecommender       | 256  | 0.1829  |   0.402   |     0.6972     |   2.4366    | 10.8437  |
|         soft_actor_critic         | 256  | 0.1897  |  0.3277   |     0.632      |   1.5219    |  9.788   |
|           lennard_jones           | 1000 |  0.135  |  0.2818   |     0.4335     |   1.0451    |  4.9188  |
|            tts_angular            |  64  | 0.2049  |  0.2723   |     0.3929     |   0.9841    |  4.1684  |
|              demucs               |  4   |   0.3   |  0.3159   |     0.2921     |    0.307    |  0.2244  |
|           hf_GPT2_large           |  4   |  4.696  |  18.8761  |      nan       |     nan     |   nan    |
|               hf_T5               |  8   | 2.0157  |  9.0035   |      nan       |     nan     |   nan    |
|        speech_transformer         |  32  | 1.5841  |   8.279   |      nan       |     nan     |   nan    |
|               dlrm                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
|           hf_Longformer           |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
|    mobilenet_v2_quantized_qat     |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
|      resnet50_quantized_qat       |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
|             tacotron2             |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|         timm_efficientnet         |  32  | 0.9937 |  0.7666   |      nan       |   0.7837    |  1.3106  |
|            Super_SloMo            |  6   | 1.0024 |  0.9527   |      nan       |     nan     |  1.1857  |
|         timm_efficientdet         |  1   | 1.0111 |   0.823   |      nan       |     nan     |  1.1165  |
|           mobilenet_v2            |  96  | 0.9928 |  0.7624   |      nan       |   0.7638    |  1.1005  |
|           squeezenet1_1           |  32  | 0.9749 |  0.8159   |     0.2763     |   0.9742    |  1.0823  |
|            timm_nfnet             | 128  | 0.9358 |  0.8936   |      nan       |   0.9478    |  1.0219  |
|              demucs               |  4   | 0.9886 |  0.9886   |     0.9886     |   0.9886    |  0.9886  |
|            tts_angular            |  64  | 0.9884 |  0.9884   |     0.9829     |   0.9884    |  0.983   |
|        shufflenet_v2_x1_0         | 128  | 0.9739 |  0.8944   |      nan       |   0.8662    |  0.9791  |
|              hf_GPT2              |  4   | 0.9548 |   0.887   |      nan       |     nan     |  0.9505  |
|            timm_regnet            |  32  | 0.9985 |  0.8614   |      nan       |   0.8784    |  0.9284  |
|              yolov3               |  16  | 0.9957 |   0.844   |      nan       |   0.8814    |  0.9231  |
|        Background_Matting         |  4   | 0.9998 |  0.9492   |      nan       |   0.9749    |  0.9139  |
|          pytorch_stargan          |  16  | 0.9975 |  1.0179   |     0.2026     |   1.0085    |  0.9023  |
|           timm_resnest            |  32  | 0.9927 |   0.88    |      nan       |   0.8024    |  0.8974  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9986 |  0.9149   |     0.2326     |   0.9141    |  0.8848  |
|        mobilenet_v3_large         |  32  | 0.9878 |  0.8563   |      nan       |   0.8681    |  0.8829  |
|             hf_Albert             |  8   | 0.9333 |  0.9333   |      nan       |     nan     |  0.8804  |
|            hf_T5_large            |  2   | 0.922  |  0.8722   |      nan       |     nan     |  0.8737  |
|           pytorch_unet            |  1   | 0.9985 |  0.8521   |      nan       |   0.8496    |  0.859   |
|             resnet50              |  32  | 0.9942 |  0.8719   |      nan       |    0.797    |  0.8565  |
|              hf_Bert              |  4   | 0.9683 |  0.8952   |      nan       |     nan     |  0.8564  |
|            densenet121            |  4   | 0.9904 |  0.8812   |      nan       |   0.8551    |  0.8562  |
|            mnasnet1_0             |  32  | 0.9869 |  0.8985   |     0.1623     |   0.8263    |  0.8531  |
|              hf_Bart              |  4   | 0.9617 |   0.878   |      nan       |     nan     |  0.8531  |
|           fastNLP_Bert            |  6   | 1.0011 |  0.9152   |      nan       |     nan     |  0.8343  |
|          resnext50_32x4d          |  8   | 0.9954 |  0.8671   |      nan       |   0.8203    |  0.8303  |
|   timm_vision_transformer_large   |  8   | 0.9997 |  0.8415   |      nan       |    0.801    |  0.8284  |
|            timm_vovnet            |  32  | 0.9933 |  0.7603   |      nan       |   0.7741    |  0.8251  |
|           BERT_pytorch            |  16  |  1.0   |  0.8995   |      nan       |     nan     |  0.825   |
|            hf_BigBird             |  2   | 0.9604 |  0.9604   |      nan       |     nan     |  0.8205  |
| attention_is_all_you_need_pytorch | 256  | 0.9476 |  0.9243   |      nan       |     nan     |  0.816   |
|           hf_DistilBert           |  8   | 0.9211 |  0.9047   |      nan       |     nan     |  0.7841  |
|               dcgan               |  32  | 0.9754 |  0.7634   |     0.3293     |   0.7634    |  0.767   |
|                drq                |  1   | 0.987  |  0.8777   |      nan       |   0.8772    |  0.7632  |
|         soft_actor_critic         | 256  | 0.9997 |  0.9637   |     0.4355     |   0.9555    |   0.75   |
|              alexnet              | 128  | 0.9542 |   0.745   |     0.3738     |   0.7455    |  0.743   |
|      timm_vision_transformer      |  8   | 0.9943 |  0.8835   |      nan       |   0.8104    |  0.712   |
|             resnet18              |  16  | 0.9831 |  0.7792   |      nan       |   0.6971    |  0.6902  |
|          LearningToPaint          |  96  | 0.9442 |  0.6902   |      nan       |   0.6274    |  0.6899  |
|               vgg16               |  64  | 0.9944 |  0.6638   |     0.2528     |   0.6639    |  0.6471  |
|           lennard_jones           | 1000 | 0.9995 |  0.9995   |     0.3711     |   1.0947    |  0.5646  |
|      nvidia_deeprecommender       | 256  | 0.5598 |  0.5598   |     0.4621     |   0.5598    |  0.5598  |
|          pytorch_struct           | 200  |  1.0   |  0.5079   |     0.4824     |   0.5079    |  0.4222  |
|       functorch_dp_cifar10        |  64  | 0.9961 |  0.8224   |      nan       |   0.8227    |  0.4056  |
|            hf_Reformer            |  4   | 0.3011 |    nan    |     0.1798     |     nan     |  0.299   |
|               hf_T5               |  8   | 0.9527 |  0.9445   |      nan       |     nan     |   nan    |
|        speech_transformer         |  32  | 0.9982 |  0.9159   |      nan       |     nan     |   nan    |
|           hf_GPT2_large           |  4   | 0.936  |  0.8768   |      nan       |     nan     |   nan    |
|               dlrm                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|           hf_Longformer           |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|    mobilenet_v2_quantized_qat     |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|      resnet50_quantized_qat       |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|             tacotron2             |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|       MT5ForConditionalGeneration       | 2  | 1.0267 |  0.9006   |      0.0       |     0.0     |  5.0142  |
|           ElectraForCausalLM            | 1  | 1.0418 |   0.914   |      0.0       |     0.0     |  4.5852  |
|            YituTechConvBert             | 1  | 1.028  |  0.9323   |      0.0       |     0.0     |  3.1147  |
|          MobileBertForMaskedLM          | 16 | 1.0238 |  0.9124   |      0.0       |     0.0     |  2.8809  |
|         MegatronBertForCausalLM         | 2  | 1.0386 |  0.9368   |      0.0       |     0.0     |  2.8791  |
|           RobertaForCausalLM            | 4  | 1.0465 |  0.9138   |      0.0       |     0.0     |  2.826   |
|     MobileBertForQuestionAnswering      | 32 | 0.972  |  0.8903   |      0.0       |     0.0     |  2.8063  |
|             OPTForCausalLM              | 4  | 1.0178 |  0.9019   |      0.0       |     0.0     |  2.8057  |
|             XGLMForCausalLM             | 1  | 1.015  |  0.8672   |      0.0       |     0.0     |  2.6862  |
|                CamemBert                | 1  | 1.0505 |  0.9388   |      0.0       |     0.0     |  2.6346  |
|     M2M100ForConditionalGeneration      | 2  | 1.0643 |  0.8763   |      0.0       |     0.0     |  2.6081  |
|     PegasusForConditionalGeneration     | 4  | 1.0109 |  0.9006   |      0.0       |     0.0     |  2.3996  |
|               DistillGPT2               | 1  | 1.0331 |  0.9401   |      0.0       |     0.0     |  2.2899  |
|               GoogleFnet                | 1  | 1.0042 |  0.8143   |      0.0       |    1.111    |  1.8532  |
|     PLBartForConditionalGeneration      | 8  | 1.0167 |  0.9092   |      0.0       |     0.0     |  1.6989  |
|      GPT2ForSequenceClassification      | 4  | 1.0005 |  0.9773   |      0.0       |     0.0     |  1.6717  |
|    MegatronBertForQuestionAnswering     | 8  | 1.041  |  0.9364   |      0.0       |     0.0     |  1.5637  |
|      MBartForConditionalGeneration      | 8  | 1.0156 |  0.9129   |      0.0       |     0.0     |  1.464   |
|            XLNetLMHeadModel             | 4  | 1.0008 |  0.9635   |      0.0       |     0.0     |  1.4343  |
|       T5ForConditionalGeneration        | 4  | 1.0015 |  0.9594   |      0.0       |     0.0     |  1.4234  |
|       ElectraForQuestionAnswering       | 64 | 1.0002 |  0.9751   |      0.0       |     0.0     |  1.3597  |
|       AlbertForQuestionAnswering        | 2  | 1.001  |  1.0031   |      0.0       |     0.0     |  1.303   |
|            AlbertForMaskedLM            | 2  | 1.0008 |  1.0008   |      0.0       |     0.0     |  1.2966  |
|                 T5Small                 | 1  | 1.0182 |  0.9491   |      0.0       |     0.0     |  1.2827  |
|       DebertaForQuestionAnswering       | 4  | 0.9315 |  0.7403   |     0.7877     |     0.0     |  1.2626  |
|    LayoutLMForSequenceClassification    | 16 |  1.0   |  0.9883   |      0.0       |     0.0     |  1.2564  |
|            TrOCRForCausalLM             | 8  | 1.0141 |  0.9401   |      0.0       |     0.0     |  1.2438  |
|           PegasusForCausalLM            | 8  | 1.0113 |  0.9188   |      0.0       |     0.0     |  1.226   |
|      BartForConditionalGeneration       | 1  | 1.0145 |  0.8909   |      0.0       |     0.0     |  1.2146  |
|         Speech2Text2ForCausalLM         | 64 | 1.0079 |  0.9289   |      0.0       |     0.0     |  1.2058  |
|            PLBartForCausalLM            | 16 | 1.0096 |  0.9481   |      0.0       |     0.0     |  1.189   |
|     DistilBertForQuestionAnswering      | 32 | 1.0294 |  0.9855   |      0.0       |     0.0     |  1.1847  |
|          DistilBertForMaskedLM          | 16 | 1.0311 |  0.9766   |      0.0       |     0.0     |  1.1776  |
|           LayoutLMForMaskedLM           | 16 | 1.0002 |  0.9701   |      0.0       |     0.0     |  1.1692  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0124 |   0.936   |      0.0       |     0.0     |  1.1621  |
|           DebertaForMaskedLM            | 4  | 0.9294 |  0.8226   |     0.7229     |     0.0     |  1.1154  |
|             BartForCausalLM             | 2  | 1.0003 |  0.9663   |      0.0       |     0.0     |  1.1048  |
|                 BigBird                 | 1  | 0.9947 |  0.9356   |      0.0       |     0.0     |  1.0958  |
|            MBartForCausalLM             | 16 | 1.0065 |  0.9641   |      0.0       |     0.0     |  1.0932  |
|        BertForQuestionAnswering         | 64 | 1.0006 |  0.9837   |      0.0       |     0.0     |  1.0923  |
|       RobertaForQuestionAnswering       | 64 | 1.0005 |  0.9842   |      0.0       |     0.0     |  1.0918  |
|             BertForMaskedLM             | 64 | 1.0003 |  0.9616   |      0.0       |     0.0     |  1.0382  |
|       BlenderbotSmallForCausalLM        | 64 | 1.0012 |  0.9092   |      0.0       |     0.0     |  1.0055  |
|          AllenaiLongformerBase          | 0  |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+
|                  name                   | bs |    eager    |  aot_eager  | aot_cudagraphs | aot_nvfuser |  inductor   |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+
|               GoogleFnet                | 1  |    pass     |    pass     |  fail_to_run   |    pass     |    pass     |
|            AlbertForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       AlbertForQuestionAnswering        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             BartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|      BartForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             BertForMaskedLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|        BertForQuestionAnswering         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                 BigBird                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                CamemBert                | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|          DistilBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     DistilBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|               DistillGPT2               | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           ElectraForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       ElectraForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|      GPT2ForSequenceClassification      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           LayoutLMForMaskedLM           | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|    LayoutLMForSequenceClassification    | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     M2M100ForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            MBartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       MT5ForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|         MegatronBertForCausalLM         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|          MobileBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     MobileBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             OPTForCausalLM              | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            PLBartForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           PegasusForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     PegasusForConditionalGeneration     | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           RobertaForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       RobertaForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|         Speech2Text2ForCausalLM         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       T5ForConditionalGeneration        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                 T5Small                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            TrOCRForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             XGLMForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            XLNetLMHeadModel             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            YituTechConvBert             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           DebertaForMaskedLM            | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |
|       DebertaForQuestionAnswering       | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |
|      MBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |
|          AllenaiLongformerBase          | 1  | fail_to_run | fail_to_run |  fail_to_run   | fail_to_run | fail_to_run |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|            XLNetLMHeadModel             | 4  | 3.5596 |  21.0783  |      nan       |     nan     | 312.0249 |
|     M2M100ForConditionalGeneration      | 2  | 2.5809 |  15.2486  |      nan       |     nan     | 210.4089 |
|       MT5ForConditionalGeneration       | 2  | 3.1548 |  13.1984  |      nan       |     nan     | 176.5319 |
|            YituTechConvBert             | 1  | 2.0595 |  9.5292   |      nan       |     nan     | 173.5309 |
|                 T5Small                 | 1  | 1.9751 |  8.9992   |      nan       |     nan     | 155.2823 |
|             XGLMForCausalLM             | 1  | 2.2003 |  11.9917  |      nan       |     nan     | 154.3692 |
|           DebertaForMaskedLM            | 4  | 4.5837 |  10.864   |    50.3046     |     nan     | 151.585  |
|       T5ForConditionalGeneration        | 4  | 1.9764 |  9.0097   |      nan       |     nan     | 145.7872 |
|      MBartForConditionalGeneration      | 8  | 2.8053 |  15.4307  |      nan       |     nan     | 144.1118 |
|     PegasusForConditionalGeneration     | 4  | 2.6133 |  14.3869  |      nan       |     nan     | 139.7853 |
|          MobileBertForMaskedLM          | 16 | 7.8085 |  27.5867  |      nan       |     nan     | 139.6162 |
|      BartForConditionalGeneration       | 1  | 2.7435 |  14.8617  |      nan       |     nan     | 131.0181 |
|     PLBartForConditionalGeneration      | 8  | 1.3766 |  7.7415   |      nan       |     nan     | 129.9398 |
|         MegatronBertForCausalLM         | 2  | 3.1214 |  12.6994  |      nan       |     nan     | 127.2605 |
|    MegatronBertForQuestionAnswering     | 8  | 3.0405 |  12.7303  |      nan       |     nan     | 125.8278 |
|     MobileBertForQuestionAnswering      | 32 | 7.7234 |  27.5121  |      nan       |     nan     | 123.5152 |
|       DebertaForQuestionAnswering       | 4  | 4.5478 |  11.3028  |    50.0742     |     nan     | 116.463  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.7337 |  9.7605   |      nan       |     nan     | 114.8053 |
|           RobertaForCausalLM            | 4  | 1.4089 |  6.2231   |      nan       |     nan     | 99.0261  |
|    LayoutLMForSequenceClassification    | 16 |  1.47  |  6.3872   |      nan       |     nan     | 88.4851  |
|      GPT2ForSequenceClassification      | 4  | 1.2504 |  5.9627   |      nan       |     nan     | 86.9855  |
|           PegasusForCausalLM            | 8  | 1.0397 |  5.5399   |      nan       |     nan     | 81.3617  |
|             BertForMaskedLM             | 64 | 1.3007 |  6.2134   |      nan       |     nan     | 80.6289  |
|       ElectraForQuestionAnswering       | 64 | 1.3165 |  6.1662   |      nan       |     nan     | 80.5771  |
|            MBartForCausalLM             | 16 | 0.9793 |  5.7255   |      nan       |     nan     | 78.1954  |
|             OPTForCausalLM              | 4  | 1.075  |  5.9214   |      nan       |     nan     | 78.0491  |
|           LayoutLMForMaskedLM           | 16 | 1.4685 |  6.7345   |      nan       |     nan     | 73.6977  |
|             BartForCausalLM             | 2  | 1.0041 |  5.4568   |      nan       |     nan     | 69.3039  |
|               DistillGPT2               | 1  | 0.6369 |  2.9506   |      nan       |     nan     | 65.9677  |
|           ElectraForCausalLM            | 1  | 1.3983 |  6.1053   |      nan       |     nan     | 65.3051  |
|         Speech2Text2ForCausalLM         | 64 | 0.5271 |  3.1655   |      nan       |     nan     | 64.4654  |
|     DistilBertForQuestionAnswering      | 32 | 0.4682 |  3.0485   |      nan       |     nan     | 63.8914  |
|            TrOCRForCausalLM             | 8  | 0.9821 |  5.4934   |      nan       |     nan     | 62.5319  |
|            PLBartForCausalLM            | 16 | 0.4722 |  2.9123   |      nan       |     nan     | 60.5701  |
|            AlbertForMaskedLM            | 2  | 1.1281 |  5.8455   |      nan       |     nan     | 59.6098  |
|                CamemBert                | 1  | 1.3774 |  5.9818   |      nan       |     nan     | 59.1032  |
|       BlenderbotSmallForCausalLM        | 64 | 0.6236 |  3.6183   |      nan       |     nan     | 58.0221  |
|        BertForQuestionAnswering         | 64 | 1.352  |  6.2505   |      nan       |     nan     | 57.4011  |
|       RobertaForQuestionAnswering       | 64 | 1.3173 |  6.1126   |      nan       |     nan     | 57.1666  |
|                 BigBird                 | 1  | 7.2274 |  13.4848  |      nan       |     nan     | 56.1339  |
|          DistilBertForMaskedLM          | 16 | 0.4511 |  3.0616   |      nan       |     nan     | 50.8777  |
|       AlbertForQuestionAnswering        | 2  | 1.2602 |  5.7535   |      nan       |     nan     | 47.6169  |
|               GoogleFnet                | 1  | 0.8012 |  3.1785   |      nan       |   9.4359    | 39.6472  |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|      GPT2ForSequenceClassification      | 4  | 0.9343 |  0.9093   |      nan       |     nan     |  1.0318  |
|            XLNetLMHeadModel             | 4  | 1.0001 |  0.8976   |      nan       |     nan     |  0.9717  |
|       ElectraForQuestionAnswering       | 64 |  1.0   |  0.9524   |      nan       |     nan     |  0.9361  |
|        BertForQuestionAnswering         | 64 |  1.0   |  0.9467   |      nan       |     nan     |  0.9354  |
|       RobertaForQuestionAnswering       | 64 |  1.0   |  0.9467   |      nan       |     nan     |  0.9354  |
|    LayoutLMForSequenceClassification    | 16 |  1.0   |  0.9348   |      nan       |     nan     |  0.9339  |
|           LayoutLMForMaskedLM           | 16 |  1.0   |  0.9409   |      nan       |     nan     |  0.888   |
|                 T5Small                 | 1  |  1.0   |  0.9325   |      nan       |     nan     |  0.8564  |
|             XGLMForCausalLM             | 1  | 0.9974 |  0.9999   |      nan       |     nan     |  0.8528  |
|     DistilBertForQuestionAnswering      | 32 |  1.0   |  0.9046   |      nan       |     nan     |  0.8394  |
|             BartForCausalLM             | 2  |  1.0   |  0.8847   |      nan       |     nan     |  0.8389  |
|             BertForMaskedLM             | 64 |  1.0   |  0.9219   |      nan       |     nan     |  0.8321  |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8465   |      nan       |     nan     |  0.8244  |
|                 BigBird                 | 1  | 0.999  |  0.9542   |      nan       |     nan     |  0.822   |
|       T5ForConditionalGeneration        | 4  |  1.0   |  0.9597   |      nan       |     nan     |  0.8215  |
|               DistillGPT2               | 1  | 0.9984 |  0.7704   |      nan       |     nan     |  0.8182  |
|            MBartForCausalLM             | 16 |  1.0   |  0.8629   |      nan       |     nan     |  0.8181  |
|                CamemBert                | 1  | 0.998  |  0.7977   |      nan       |     nan     |  0.8088  |
|          DistilBertForMaskedLM          | 16 | 0.9998 |  0.9138   |      nan       |     nan     |  0.8055  |
|            PLBartForCausalLM            | 16 |  1.0   |  0.8805   |      nan       |     nan     |  0.8028  |
|            YituTechConvBert             | 1  | 0.9858 |  0.7923   |      nan       |     nan     |  0.8025  |
|           PegasusForCausalLM            | 8  | 0.9778 |  0.9323   |      nan       |     nan     |  0.802   |
|    MegatronBertForQuestionAnswering     | 8  | 0.923  |  0.8265   |      nan       |     nan     |  0.7975  |
|      MBartForConditionalGeneration      | 8  |  1.0   |  0.8136   |      nan       |     nan     |  0.7949  |
|           RobertaForCausalLM            | 4  | 0.9058 |  0.7778   |      nan       |     nan     |  0.7882  |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.8048   |      nan       |     nan     |  0.7873  |
|         Speech2Text2ForCausalLM         | 64 | 0.9565 |  0.8462   |      nan       |     nan     |  0.7768  |
|               GoogleFnet                | 1  | 0.9983 |  0.9453   |      nan       |   1.0813    |  0.7687  |
|             OPTForCausalLM              | 4  | 0.9979 |  0.7508   |      nan       |     nan     |  0.763   |
| BlenderbotSmallForConditionalGeneration | 32 |  1.0   |  0.9036   |      nan       |     nan     |  0.7612  |
|     PLBartForConditionalGeneration      | 8  |  1.0   |  0.8221   |      nan       |     nan     |  0.7547  |
|     PegasusForConditionalGeneration     | 4  | 0.9993 |  0.9002   |      nan       |     nan     |  0.7318  |
|       BlenderbotSmallForCausalLM        | 64 |  1.0   |  0.8401   |      nan       |     nan     |  0.7277  |
|     M2M100ForConditionalGeneration      | 2  | 0.9943 |  0.9857   |      nan       |     nan     |  0.7268  |
|         MegatronBertForCausalLM         | 2  | 0.7066 |  0.7066   |      nan       |     nan     |  0.7066  |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.9369   |      nan       |     nan     |  0.6763  |
|            AlbertForMaskedLM            | 2  | 0.9999 |  0.9172   |      nan       |     nan     |  0.6633  |
|       MT5ForConditionalGeneration       | 2  | 0.6173 |  0.6173   |      nan       |     nan     |  0.6173  |
|           ElectraForCausalLM            | 1  |  1.0   |  0.9107   |      nan       |     nan     |  0.6123  |
|          MobileBertForMaskedLM          | 16 | 0.9997 |  0.9179   |      nan       |     nan     |  0.5861  |
|     MobileBertForQuestionAnswering      | 32 |  1.0   |  0.9716   |      nan       |     nan     |  0.4668  |
|           DebertaForMaskedLM            | 4  |  1.0   |  0.9851   |     0.352      |     nan     |  0.4265  |
|       DebertaForQuestionAnswering       | 4  | 0.9845 |  1.0525   |     0.3276     |     nan     |  0.3569  |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|        res2net50_14w_8s         |  2  | 1.0023 |   1.021   |      0.0       |   1.4367    |  5.4417  |
|           res2next50            |  2  | 1.0032 |  1.0418   |      0.0       |   1.3826    |  4.7325  |
|            hrnet_w18            |  2  | 1.0074 |  1.0893   |      0.0       |   1.4747    |  4.6638  |
|          ghostnet_100           | 128 | 0.9989 |  0.9943   |      0.0       |   1.2494    |  1.7883  |
|            lcnet_050            | 128 | 0.9562 |  0.9498   |      0.0       |   1.5017    |  1.661   |
|         coat_lite_mini          | 128 |  1.0   |  0.9983   |      0.0       |   1.0562    |  1.6088  |
|        tnt_s_patch16_224        | 64  | 0.9998 |  0.9986   |      0.0       |   1.6019    |  1.541   |
|        twins_pcpvt_base         | 32  | 1.0051 |  0.9666   |      0.0       |   1.3279    |  1.4973  |
|           regnety_002           | 128 | 0.979  |  0.9918   |      0.0       |   1.3592    |  1.4812  |
|           dm_nfnet_f0           | 128 | 0.999  |  1.0009   |      0.0       |   1.2127    |  1.4759  |
|      xcit_large_24_p8_224       |  5  | 1.0027 |  0.9835   |      0.0       |     0.0     |  1.446   |
|           resnest101e           | 32  | 1.0034 |  1.0455   |      0.0       |   1.1884    |  1.4273  |
|         crossvit_9_240          | 64  | 1.0069 |  0.9985   |      0.0       |   1.0433    |  1.4159  |
|            nfnet_l0             | 64  | 0.9994 |   0.795   |      0.0       |   1.0534    |  1.395   |
|           volo_d1_224           | 64  |  1.0   |  0.9962   |      0.0       |   1.1314    |  1.3854  |
|             dla102              | 64  | 0.9999 |   0.996   |      0.0       |    1.288    |  1.351   |
|          gmixer_24_224          | 64  | 1.0001 |  0.8412   |      0.0       |   0.9863    |  1.3503  |
|         mobilenetv2_100         | 128 | 0.9665 |  0.9643   |      0.0       |   1.0157    |  1.3378  |
|       gluon_inception_v3        | 128 |  1.0   |  0.9989   |      0.0       |    1.126    |  1.3278  |
|          inception_v3           | 128 | 0.9999 |   0.999   |      0.0       |   1.1263    |  1.3268  |
|        adv_inception_v3         | 128 |  1.0   |  0.9987   |      0.0       |   1.1242    |  1.3254  |
|      mobilenetv3_large_100      | 128 | 0.9658 |   0.963   |      0.0       |    1.168    |  1.3179  |
|            fbnetv3_b            | 128 | 0.9653 |  0.9616   |      0.0       |   1.1356    |  1.2853  |
|        sebotnet33ts_256         | 64  | 0.9763 |  0.8073   |      0.0       |   1.0541    |  1.2788  |
|          jx_nest_base           | 32  |  1.0   |  0.9946   |      0.0       |   1.2115    |  1.2763  |
|          cait_m36_384           |  2  | 0.9439 |  0.9721   |      0.0       |   1.0493    |  1.2625  |
|          botnet26t_256          | 128 | 0.9852 |  0.9851   |      0.0       |   1.2268    |  1.261   |
|       tf_efficientnet_b0        | 128 | 0.9765 |  0.7824   |      0.0       |   0.9852    |  1.2608  |
|           mnasnet_100           | 128 | 0.9669 |  0.9629   |      0.0       |   1.1577    |  1.252   |
|           fbnetc_100            | 128 | 0.9652 |  0.9638   |      0.0       |   1.1891    |  1.251   |
|           selecsls42b           | 128 | 0.9999 |  0.9988   |      0.0       |   1.2104    |  1.2462  |
|           convit_base           | 32  | 0.9996 |  0.9938   |      0.0       |   1.1919    |  1.2416  |
|          spnasnet_100           | 128 | 0.9609 |  0.9581   |      0.0       |   1.1374    |  1.239   |
|        eca_halonext26ts         | 64  | 0.9747 |  0.7754   |      0.0       |   1.0178    |  1.235   |
|       eca_botnext26ts_256       | 64  | 0.973  |  0.7702   |      0.0       |   1.0149    |  1.2346  |
|          cspdarknet53           | 64  | 0.9587 |  0.9548   |      0.0       |   1.1843    |  1.2296  |
|        res2net101_26w_4s        | 64  |  1.0   |  0.9975   |      0.0       |   1.1754    |  1.2288  |
|            pit_b_224            | 64  |  1.0   |  0.9988   |      0.0       |   1.0547    |  1.2272  |
|        ese_vovnet19b_dw         | 128 | 0.9797 |  0.9775   |      0.0       |   1.1443    |  1.2259  |
|          pnasnet5large          | 16  | 0.9998 |  0.9983   |      0.0       |   1.0842    |  1.2113  |
|           rexnet_100            | 128 | 0.9726 |  0.8166   |      0.0       |   0.9835    |  1.2025  |
|            tinynet_a            | 128 | 0.9659 |  0.7756   |      0.0       |   0.9721    |  1.1871  |
|           mobilevit_s           | 32  | 0.9619 |  0.7651   |      0.0       |   0.9897    |  1.1815  |
|             dpn107              | 32  | 0.9575 |  0.9509   |      0.0       |   1.0291    |  1.1804  |
|          convnext_base          | 32  | 0.9998 |  0.9979   |      0.0       |   1.0451    |  1.175   |
|            repvgg_a2            | 128 | 0.9639 |  0.9632   |      0.0       |   1.1224    |  1.172   |
|         poolformer_m36          | 64  | 0.9999 |  0.9978   |      0.0       |     0.0     |  1.1689  |
|           tf_mixnet_l           | 64  | 0.9721 |  0.8768   |      0.0       |   1.0043    |  1.1481  |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9786   |      0.0       |   0.9883    |  1.1446  |
|            mixnet_l             | 64  | 0.9711 |  0.8729   |      0.0       |   1.0043    |  1.132   |
|      beit_base_patch16_224      | 64  | 1.0001 |  0.9823   |      0.0       |   0.9491    |  1.1196  |
|          gmlp_s16_224           | 64  |  1.0   |   0.997   |      0.0       |   0.9949    |  1.1169  |
|     swsl_resnext101_32x16d      | 32  | 0.9998 |  0.9985   |      0.0       |   1.1071    |  1.1109  |
| deit_base_distilled_patch16_224 | 64  | 0.9994 |  0.9979   |      0.0       |   1.0141    |  1.101   |
|      vit_base_patch16_224       | 64  | 0.9989 |  0.9986   |      0.0       |   0.9757    |  1.0942  |
|        gluon_xception65         | 32  | 0.9999 |  0.9974   |      0.0       |   1.0405    |  1.0872  |
|        convmixer_768_32         | 32  | 0.9999 |    1.0    |      0.0       |   1.0624    |  1.0785  |
|            gernet_l             | 128 | 0.974  |  0.9725   |      0.0       |   1.0996    |  1.0758  |
|          mixer_b16_224          | 64  | 0.9998 |  0.9984   |      0.0       |   0.9793    |  1.0614  |
|         visformer_small         | 128 | 0.9995 |  1.0021   |      0.0       |   1.0216    |  1.0489  |
|          resmlp_12_224          | 128 | 0.9998 |  1.0008   |     0.6938     |     0.0     |  1.0361  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|          convnext_base          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          gmixer_24_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          gmlp_s16_224           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|        adv_inception_v3         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          botnet26t_256          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        convmixer_768_32         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          cspdarknet53           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dla102              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dpn107              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        eca_halonext26ts         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            gernet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          ghostnet_100           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       gluon_inception_v3        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            hrnet_w18            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          inception_v3           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            lcnet_050            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            mixnet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         mobilenetv2_100         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           mobilevit_s           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            nfnet_l0             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          pnasnet5large          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           regnety_002           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net101_26w_4s        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net50_14w_8s         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           res2next50            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           rexnet_100            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           selecsls42b           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           tf_mixnet_l           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            tinynet_a            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         visformer_small         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      vit_base_patch16_224       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |
|      xcit_large_24_p8_224       | 2  | pass  | fail_accuracy |  fail_to_run   |  fail_to_run  |     pass      |
|        gluon_xception65         | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|         poolformer_m36          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|         coat_lite_mini          | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            pit_b_224            | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            fbnetv3_b            | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|           resnest101e           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy | fail_accuracy |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|        twins_pcpvt_base         | 32  | 2.1545 |  13.4712  |      nan       |   44.7514   | 818.6787 |
|         coat_lite_mini          | 128 | 0.9362 |  5.3097   |      nan       |   14.297    | 763.735  |
|           mobilevit_s           | 32  | 1.7381 |   7.53    |      nan       |   41.8103   | 618.3934 |
|  swin_base_patch4_window7_224   | 64  | 2.5412 |  13.4929  |      nan       |   59.0405   | 569.5724 |
|       eca_botnext26ts_256       | 64  | 1.4052 |  5.1523   |      nan       |   49.1347   | 504.172  |
|        eca_halonext26ts         | 64  | 1.4655 |  5.5331   |      nan       |   51.4978   | 468.5572 |
|          convnext_base          | 32  | 1.4166 |  6.3582   |      nan       |   21.0058   | 400.2996 |
|          jx_nest_base           | 32  | 1.7338 |  9.3534   |      nan       |   58.7419   | 362.9152 |
|        sebotnet33ts_256         | 64  | 1.7467 |  6.8392   |      nan       |   52.1838   | 323.8735 |
|      xcit_large_24_p8_224       |  5  | 2.6636 |  17.3057  |      nan       |     nan     | 294.3484 |
|            hrnet_w18            |  2  | 6.4389 |  33.866   |      nan       |  198.4745   | 273.3504 |
|         crossvit_9_240          | 64  | 1.3803 |  7.9884   |      nan       |   26.1988   | 271.7885 |
|          pnasnet5large          | 16  | 4.4888 |  24.1218  |      nan       |  124.0966   | 264.7909 |
|          botnet26t_256          | 128 | 1.4471 |  4.7322   |      nan       |   41.4332   | 263.6685 |
|           rexnet_100            | 128 | 2.0851 |  7.7755   |      nan       |  103.2984   | 258.9271 |
|          cait_m36_384           |  2  | 2.729  |  18.1574  |      nan       |   45.5959   | 251.3507 |
|           resnest101e           | 32  | 3.2183 |  17.3906  |      nan       |   75.5499   | 242.1176 |
|             dpn107              | 32  | 4.0859 |  15.2068  |      nan       |   76.4032   | 235.3065 |
|          ghostnet_100           | 128 | 2.9307 |  10.4156  |      nan       |   60.8427   | 231.0067 |
|           volo_d1_224           | 64  | 1.2606 |  8.0126   |      nan       |   27.3627   | 227.4007 |
|           tf_mixnet_l           | 64  | 5.7282 |  13.0792  |      nan       |   62.1662   | 211.3435 |
|          inception_v3           | 128 | 1.7092 |  9.6808   |      nan       |   67.1119   | 202.2545 |
|            tinynet_a            | 128 | 2.1547 |  8.1749   |      nan       |   61.5795   | 201.9643 |
|            mixnet_l             | 64  | 5.616  |  12.6784  |      nan       |   61.8845   | 194.7373 |
|           convit_base           | 32  | 0.9878 |  6.0601   |      nan       |   18.4927   | 193.2586 |
|        res2net50_14w_8s         |  2  | 3.0033 |  16.417   |      nan       |   69.1403   | 190.2239 |
|         visformer_small         | 128 | 0.9399 |  4.1987   |      nan       |   23.9829   | 187.0458 |
|        res2net101_26w_4s        | 64  | 3.0979 |  18.0078  |      nan       |   81.3689   | 184.6844 |
|            fbnetv3_b            | 128 | 3.1269 |  11.3755  |      nan       |   76.457    | 170.2994 |
|            pit_b_224            | 64  | 0.9264 |  5.1432   |      nan       |   12.3061   | 168.1234 |
|        adv_inception_v3         | 128 | 1.7876 |  9.3248   |      nan       |   67.9682   | 165.1927 |
|       gluon_inception_v3        | 128 | 1.714  |  9.3951   |      nan       |   68.3431   | 163.2025 |
|       tf_efficientnet_b0        | 128 | 2.0104 |  7.3634   |      nan       |   62.7791   | 159.1046 |
|          gmlp_s16_224           | 64  | 1.0287 |  6.1202   |      nan       |   13.4232   | 156.9587 |
|      mobilenetv3_large_100      | 128 | 1.6485 |  5.7583   |      nan       |   63.7479   | 143.6383 |
|             dla102              | 64  | 1.8324 |  10.6917  |      nan       |   63.422    | 142.5177 |
|        tnt_s_patch16_224        | 64  | 1.587  |  10.2362  |      nan       |   23.4649   | 136.1545 |
|         poolformer_m36          | 64  | 1.9209 |  9.6393   |      nan       |     nan     | 133.6576 |
|          gmixer_24_224          | 64  | 1.1452 |   7.171   |      nan       |   16.2136   | 132.0786 |
|          spnasnet_100           | 128 | 2.1882 |  7.0447   |      nan       |   44.1093   | 125.4818 |
|          cspdarknet53           | 64  | 2.3501 |  7.9329   |      nan       |   48.8457   | 124.4366 |
|           fbnetc_100            | 128 | 2.3825 |  7.3795   |      nan       |   46.2125   | 124.3139 |
|           res2next50            |  2  | 1.8125 |  9.0934   |      nan       |   42.9215   | 124.1669 |
|           mnasnet_100           | 128 | 1.699  |  5.8737   |      nan       |   37.7042   | 111.3659 |
|         mobilenetv2_100         | 128 | 1.7142 |   5.662   |      nan       |   37.8371   | 110.7728 |
|          resmlp_12_224          | 128 | 0.5297 |  2.7314   |     5.7904     |     nan     | 108.3878 |
|          mixer_b16_224          | 64  | 0.5236 |  3.0921   |      nan       |   10.8647   | 106.6953 |
|        gluon_xception65         | 32  | 1.9241 |  11.9308  |      nan       |   42.924    | 103.912  |
|           selecsls42b           | 128 | 0.7316 |  4.2612   |      nan       |   39.9213   | 103.8341 |
|            nfnet_l0             | 64  | 1.7172 |  7.7109   |      nan       |   26.9313   | 102.6439 |
|           dm_nfnet_f0           | 128 | 1.9924 |  7.8528   |      nan       |   29.7799   | 101.0003 |
|           regnety_002           | 128 | 1.6675 |  6.1593   |      nan       |   46.7102   | 98.5359  |
|        ese_vovnet19b_dw         | 128 | 0.9743 |  3.2654   |      nan       |   31.3467   | 94.7237  |
| deit_base_distilled_patch16_224 | 64  | 0.8017 |  4.1752   |      nan       |   10.2984   | 84.1742  |
|     swsl_resnext101_32x16d      | 32  | 1.8167 |  10.3372  |      nan       |   40.3351   | 82.2403  |
|            gernet_l             | 128 | 2.1434 |  6.8841   |      nan       |   36.1802   | 78.0454  |
|      beit_base_patch16_224      | 64  | 1.1627 |  5.4449   |      nan       |   14.2737   |  75.764  |
|            lcnet_050            | 128 | 1.0973 |  3.6336   |      nan       |   32.0027   | 75.5806  |
|            repvgg_a2            | 128 | 2.0646 |  6.6912   |      nan       |   44.5267   | 70.8502  |
|      vit_base_patch16_224       | 64  | 0.7102 |  4.2494   |      nan       |    9.537    | 65.9388  |
|        convmixer_768_32         | 32  | 1.2988 |  6.8908   |      nan       |   13.9593   | 34.1175  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|          gmixer_24_224          | 64  | 0.9952 |  0.9645   |      nan       |   0.9825    |  1.3808  |
|            tinynet_a            | 128 | 0.9942 |  0.7796   |      nan       |   0.7823    |  1.351   |
|           rexnet_100            | 128 | 0.9935 |  0.7843   |      nan       |   0.8682    |  1.2619  |
|            nfnet_l0             | 64  | 0.9948 |  0.8256   |      nan       |    0.813    |  1.2555  |
|       tf_efficientnet_b0        | 128 | 0.9935 |  0.7688   |      nan       |   0.8401    |  1.1889  |
|          pnasnet5large          | 16  | 1.069  |   1.011   |      nan       |   1.2062    |  1.1879  |
|           mobilevit_s           | 32  | 0.9959 |  0.7668   |      nan       |    0.741    |  1.141   |
|       eca_botnext26ts_256       | 64  | 0.9938 |  0.7669   |      nan       |   0.7642    |  1.1318  |
|        eca_halonext26ts         | 64  | 0.9938 |   0.768   |      nan       |   0.7694    |  1.1317  |
|         mobilenetv2_100         | 128 | 0.9925 |  0.7621   |      nan       |   0.7635    |  1.1003  |
|           convit_base           | 32  | 0.9977 |  0.8861   |      nan       |   0.9501    |  1.068   |
|         poolformer_m36          | 64  | 0.998  |  0.9512   |      nan       |     nan     |  1.0527  |
|             dla102              | 64  | 0.9841 |  0.9148   |      nan       |   0.9504    |  1.0492  |
|          ghostnet_100           | 128 | 0.9865 |  0.8768   |      nan       |   0.9345    |  1.0353  |
|           dm_nfnet_f0           | 128 | 0.9358 |  0.8936   |      nan       |   0.9479    |  1.0219  |
|          cait_m36_384           |  2  | 0.9998 |   0.902   |      nan       |   0.9203    |  1.011   |
|           resnest101e           | 32  | 0.9972 |  0.9435   |      nan       |   0.9425    |  0.9914  |
|           selecsls42b           | 128 | 0.9883 |  0.8896   |      nan       |   0.8954    |  0.9913  |
|        ese_vovnet19b_dw         | 128 | 0.9923 |  0.8877   |      nan       |   0.9302    |  0.9886  |
|        convmixer_768_32         | 32  | 0.9986 |  0.9854   |      nan       |   0.9793    |  0.9836  |
|            fbnetv3_b            | 128 | 0.9932 |  0.7828   |      nan       |    0.784    |  0.9696  |
|           tf_mixnet_l           | 64  | 0.9956 |  0.8577   |      nan       |   0.8572    |  0.9695  |
|          mixer_b16_224          | 64  | 0.9956 |  0.9574   |      nan       |   0.8644    |  0.9357  |
|        gluon_xception65         | 32  | 0.9975 |  0.9365   |      nan       |   0.8982    |  0.9351  |
|      beit_base_patch16_224      | 64  | 0.9966 |  0.9545   |      nan       |   0.8606    |  0.9272  |
|        res2net101_26w_4s        | 64  | 0.9968 |  0.9278   |      nan       |   0.8932    |  0.9269  |
|          gmlp_s16_224           | 64  | 0.9958 |  0.9727   |      nan       |    0.966    |  0.9267  |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9434   |      nan       |   0.8229    |  0.915   |
|        tnt_s_patch16_224        | 64  | 0.9963 |  0.9715   |      nan       |   0.8518    |  0.9131  |
|           volo_d1_224           | 64  | 0.996  |  0.9213   |      nan       |   0.7472    |  0.9124  |
|      xcit_large_24_p8_224       |  5  | 0.9981 |  0.9194   |      nan       |     nan     |  0.912   |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9442   |      nan       |   0.8242    |  0.9095  |
|             dpn107              | 32  | 0.9985 |  0.9271   |      nan       |   0.8941    |  0.9056  |
|          spnasnet_100           | 128 | 0.989  |  0.9109   |      nan       |   0.8412    |  0.9047  |
|      mobilenetv3_large_100      | 128 | 0.9876 |  0.8589   |      nan       |   0.8745    |  0.9007  |
|         visformer_small         | 128 | 0.9943 |  0.9381   |      nan       |   0.9475    |  0.9006  |
|          convnext_base          | 32  | 0.998  |  0.9059   |      nan       |   0.7678    |  0.9006  |
|            mixnet_l             | 64  | 0.995  |  0.8449   |      nan       |   0.7907    |  0.8995  |
|        adv_inception_v3         | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |
|       gluon_inception_v3        | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |
|          inception_v3           | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |
|           mnasnet_100           | 128 | 0.9877 |  0.9019   |      nan       |   0.8279    |  0.8961  |
|     swsl_resnext101_32x16d      | 32  | 0.9991 |  0.8972   |      nan       |   0.8675    |  0.8931  |
|            lcnet_050            | 128 | 0.9672 |  0.7521   |      nan       |   0.7524    |  0.8921  |
|          cspdarknet53           | 64  | 0.9954 |  0.8528   |      nan       |   0.8762    |  0.8835  |
|        twins_pcpvt_base         | 32  | 0.9971 |  0.9101   |      nan       |   0.8351    |  0.8722  |
|           regnety_002           | 128 | 0.9717 |  0.8104   |      nan       |   0.7599    |  0.8617  |
|          botnet26t_256          | 128 | 0.9915 |  0.8434   |      nan       |    0.745    |  0.8605  |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9288   |      nan       |    0.83     |  0.8585  |
|          jx_nest_base           | 32  | 1.0002 |  0.8966   |      nan       |   0.7112    |  0.8575  |
|           fbnetc_100            | 128 | 0.9891 |  0.8518   |      nan       |   0.7446    |  0.8416  |
|        sebotnet33ts_256         | 64  | 0.9952 |  0.7084   |      nan       |   0.6831    |  0.841   |
|        res2net50_14w_8s         |  2  | 0.9976 |   0.837   |      nan       |   0.8458    |  0.8293  |
|           res2next50            |  2  | 0.9972 |  0.8331   |      nan       |    0.841    |  0.821   |
|          resmlp_12_224          | 128 | 0.9893 |   0.943   |     0.2472     |     nan     |  0.8169  |
|         crossvit_9_240          | 64  | 0.9886 |  0.8633   |      nan       |    0.729    |  0.8063  |
|            gernet_l             | 128 | 0.9884 |  0.7892   |      nan       |   0.7938    |  0.7928  |
|            pit_b_224            | 64  | 0.9968 |  0.7947   |      nan       |   0.6417    |  0.792   |
|         coat_lite_mini          | 128 | 1.0049 |  0.8777   |      nan       |   0.7873    |  0.7899  |
|            repvgg_a2            | 128 | 0.9867 |  0.8054   |      nan       |   0.6573    |  0.7684  |
|            hrnet_w18            |  2  | 0.9947 |  0.8779   |      nan       |   0.8833    |  0.6735  |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Performance graphs

see more

bench_logs/huggingface_float32.png :

bench_logs/timm_models_float32.png :

bench_logs/torchbench_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      | 94%, 50/53 | 98%, 42/43  | 100%, 61/61 |
|   aot_eager    | 94%, 50/53 | 98%, 42/43  | 90%, 55/61  |
| aot_cudagraphs | 26%, 14/53 |  0%, 0/43   |  11%, 7/61  |
|  aot_nvfuser   | 60%, 32/53 |  0%, 0/43   | 75%, 46/61  |
|    inductor    | 83%, 44/53 | 93%, 40/43  | 93%, 57/61  |
+----------------+------------+-------------+-------------+

Geometric mean speedup

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   1.00x    |    1.01x    |    1.00x    |
|   aot_eager    |   1.00x    |    1.00x    |    1.00x    |
| aot_cudagraphs |   1.09x    |    0.0x     |    1.00x    |
|  aot_nvfuser   |   1.16x    |    0.0x     |    1.20x    |
|    inductor    |   1.84x    |    2.29x    |    1.55x    |
+----------------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |    1.94    |    2.55     |    2.30     |
|   aot_eager    |    8.04    |    12.73    |    11.51    |
| aot_cudagraphs |    6.98    |     0.0     |    52.51    |
|  aot_nvfuser   |   27.44    |     0.0     |    71.07    |
|    inductor    |   139.38   |   117.39    |   262.93    |
+----------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------------+------------+-------------+-------------+
|    Compiler    | torchbench | huggingface | timm_models |
+----------------+------------+-------------+-------------+
|     eager      |   0.96x    |    0.98x    |    0.99x    |
|   aot_eager    |   0.85x    |    0.87x    |    0.87x    |
| aot_cudagraphs |   0.43x    |    0.0x     |    0.20x    |
|  aot_nvfuser   |   0.83x    |    0.0x     |    0.85x    |
|    inductor    |   0.83x    |    0.86x    |    0.94x    |
+----------------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|            densenet121            |  4   | 1.0016 |  0.9128   |      0.0       |   1.4046    |  6.0578  |
|       functorch_dp_cifar10        |  64  | 0.9984 |  0.9128   |      0.0       |   1.1897    |  4.8757  |
|         timm_efficientdet         |  1   | 0.9866 |  0.8056   |      0.0       |     0.0     |  4.6029  |
|          resnext50_32x4d          |  8   | 1.003  |  0.9596   |      0.0       |   1.3319    |  3.465   |
|           BERT_pytorch            |  16  | 1.0131 |  0.8196   |      0.0       |     0.0     |  3.2081  |
|             resnet18              |  16  | 1.003  |  0.9918   |      0.0       |   1.3416    |  3.1562  |
|      timm_vision_transformer      |  8   | 1.0071 |  0.8362   |      0.0       |    1.359    |  3.0948  |
|        mobilenet_v3_large         |  32  | 1.0045 |  1.0038   |      0.0       |   1.4187    |  3.0025  |
|                drq                |  1   | 1.0032 |  0.7825   |      0.0       |   1.0929    |  2.9519  |
|            mnasnet1_0             |  32  | 0.9992 |  1.0109   |     0.9029     |   1.3975    |  2.8887  |
|               dcgan               |  32  | 0.9918 |  0.9024   |     1.1617     |   0.7299    |  2.683   |
|            hf_T5_large            |  2   | 1.0217 |  0.8511   |      0.0       |     0.0     |  2.4185  |
|           squeezenet1_1           |  32  | 0.9979 |  0.9596   |     1.3413     |   1.1839    |  2.4129  |
|             hf_Albert             |  8   | 1.0004 |   0.954   |      0.0       |     0.0     |  2.3793  |
|              hf_Bert              |  4   | 1.0321 |  0.8543   |      0.0       |     0.0     |  2.2348  |
|         timm_efficientnet         |  32  | 0.9653 |  0.8108   |      0.0       |   1.1785    |  2.1117  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9963 |  0.9087   |     1.3033     |   1.2143    |  2.0914  |
|           lennard_jones           | 1000 | 0.9734 |  0.7478   |     1.2849     |   1.0403    |  2.0222  |
|          pytorch_struct           | 200  | 1.0015 |  0.7365   |     1.0253     |   1.0181    |  1.9967  |
|           timm_resnest            |  32  | 1.0037 |  1.0179   |      0.0       |   1.3234    |  1.9188  |
|              hf_GPT2              |  4   | 1.0211 |   0.984   |      0.0       |     0.0     |  1.8585  |
|          LearningToPaint          |  96  | 1.0045 |  1.0015   |      0.0       |    1.365    |  1.8496  |
|               hf_T5               |  8   |  1.0   |  0.9452   |      0.0       |     0.0     |  1.8407  |
|             resnet50              |  32  | 1.0013 |  1.0069   |      0.0       |   1.3706    |  1.736   |
|              hf_Bart              |  4   | 1.0161 |  0.8234   |      0.0       |     0.0     |  1.7327  |
|        shufflenet_v2_x1_0         | 128  | 0.9989 |  1.0207   |      0.0       |   1.3415    |  1.7108  |
| attention_is_all_you_need_pytorch | 256  | 1.007  |  0.8955   |      0.0       |     0.0     |  1.654   |
|           mobilenet_v2            |  96  | 0.9999 |  1.0149   |      0.0       |    0.925    |  1.5593  |
|           hf_DistilBert           |  8   | 1.0015 |  0.9712   |      0.0       |     0.0     |  1.5318  |
|            timm_nfnet             | 128  | 0.9993 |   0.999   |      0.0       |   1.1732    |  1.4948  |
|         soft_actor_critic         | 256  | 1.0007 |  0.7214   |     1.3052     |    1.071    |  1.4778  |
|           fastNLP_Bert            |  6   | 0.9985 |  0.8814   |      0.0       |     0.0     |  1.4658  |
|            timm_regnet            |  32  | 0.981  |  0.9283   |      0.0       |   1.1835    |  1.4191  |
|           pytorch_unet            |  1   | 0.9993 |  0.9927   |      0.0       |   1.1557    |  1.3436  |
|          pytorch_stargan          |  16  | 1.0001 |  1.0162   |     0.8287     |    1.102    |  1.3391  |
|            Super_SloMo            |  6   |  1.0   |  0.9959   |      0.0       |     0.0     |  1.2901  |
|            timm_vovnet            |  32  | 0.9207 |  0.8885   |      0.0       |   1.1291    |  1.2719  |
|               vgg16               |  64  | 0.9995 |  0.9972   |     0.797      |   0.9942    |  1.2699  |
|        Background_Matting         |  4   | 0.9996 |  1.0184   |      0.0       |   1.1152    |  1.2249  |
|              alexnet              | 128  | 0.9991 |  0.9974   |     0.7883     |   1.0033    |  1.2095  |
|   timm_vision_transformer_large   |  8   |  1.0   |  0.9907   |      0.0       |   0.9931    |  1.161   |
|            hf_Reformer            |  4   | 0.9948 |  0.9994   |     0.9196     |     0.0     |  1.1582  |
|            hf_BigBird             |  2   | 0.9939 |   0.912   |      0.0       |     0.0     |  1.152   |
|              yolov3               |  16  | 0.9998 |  0.9903   |      0.0       |   0.9207    |  1.103   |
|            tts_angular            |  64  | 0.9834 |  0.9287   |     0.9751     |   0.9939    |  1.0307  |
|              demucs               |  4   | 0.999  |  1.0007   |     1.0007     |   1.0011    |  0.999   |
|      nvidia_deeprecommender       | 256  | 0.999  |  0.9962   |     0.6963     |   0.9792    |  0.9887  |
|               dlrm                | 2048 | 1.0007 |   1.114   |      0.0       |     0.0     |   0.0    |
|           hf_GPT2_large           |  4   | 1.0006 |   0.99    |      0.0       |     0.0     |   0.0    |
|        speech_transformer         |  32  | 1.0278 |  0.8292   |      0.0       |     0.0     |   0.0    |
|           hf_Longformer           |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
|             tacotron2             |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |
|        Background_Matting         |  4  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          LearningToPaint          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            densenet121            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|                drq                |  1  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           mobilenet_v2            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           pytorch_unet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet18              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|             resnet50              |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|          resnext50_32x4d          |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|         timm_efficientnet         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_regnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|           timm_resnest            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|      timm_vision_transformer      |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            timm_vovnet            |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            Super_SloMo            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           fastNLP_Bert            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|             hf_Albert             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_Bert              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_BigBird             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|           hf_DistilBert           |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|              yolov3               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|           hf_Longformer           |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|             tacotron2             |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |   fail_to_run    |       pass       |  fail_accuracy   |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |      0.0000      |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+----------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+
|              yolov3               |  16  | 3.1052  |  11.2125  |      nan       |   41.2251   | 906.3816 |
|         timm_efficientdet         |  1   | 20.1053 |  44.9842  |      nan       |     nan     | 897.6359 |
|            hf_T5_large            |  2   | 13.096  |  48.6466  |      nan       |     nan     | 593.4358 |
|            densenet121            |  4   | 2.5514  |  17.707   |      nan       |  130.9259   | 366.322  |
| attention_is_all_you_need_pytorch | 256  | 1.2729  |  9.3039   |      nan       |     nan     | 266.2822 |
|           timm_resnest            |  32  |  0.645  |  3.6638   |      nan       |   42.8781   | 233.3464 |
|         timm_efficientnet         |  32  | 1.9662  |  8.6792   |      nan       |   70.0011   | 207.0271 |
|      timm_vision_transformer      |  8   | 0.9339  |  5.8408   |      nan       |   13.9729   | 204.3233 |
|   timm_vision_transformer_large   |  8   | 2.9844  |  19.3545  |      nan       |   38.5976   | 189.3326 |
|           BERT_pytorch            |  16  | 1.7096  |  10.1075  |      nan       |     nan     | 177.9434 |
|        mobilenet_v3_large         |  32  | 1.0609  |  6.7853   |      nan       |   72.766    | 176.0954 |
|            mnasnet1_0             |  32  |  0.983  |  6.3088   |    41.4888     |   44.2248   | 166.2427 |
|               hf_T5               |  8   | 2.1692  |  10.9186  |      nan       |     nan     | 157.1346 |
|           mobilenet_v2            |  96  | 0.9477  |  6.7219   |      nan       |   41.471    | 149.8098 |
|           fastNLP_Bert            |  6   | 1.7424  |  9.0851   |      nan       |     nan     | 149.2335 |
|          pytorch_stargan          |  16  | 0.4613  |  2.9484   |    11.3102     |   7.2417    | 148.0754 |
|              hf_Bart              |  4   | 1.7379  |  11.0254  |      nan       |     nan     | 147.5629 |
|              hf_GPT2              |  4   | 1.4723  |  7.7773   |      nan       |     nan     | 138.5882 |
|            timm_vovnet            |  32  | 1.6255  |  5.9768   |      nan       |   31.1606   | 127.3594 |
|          pytorch_struct           | 200  | 0.2725  |   1.134   |     1.789      |   5.3349    | 127.1456 |
|            timm_regnet            |  32  | 2.5147  |  11.0602  |      nan       |   61.0366   | 123.1854 |
|          resnext50_32x4d          |  8   | 1.0525  |  7.2686   |      nan       |   36.9586   |  115.97  |
|        shufflenet_v2_x1_0         | 128  | 1.1291  |  7.4268   |      nan       |   38.3192   | 111.0196 |
|            Super_SloMo            |  6   | 1.0535  |  6.0007   |      nan       |     nan     | 110.8918 |
|            timm_nfnet             | 128  | 2.0271  |  8.8473   |      nan       |   37.7776   | 110.1183 |
|        Background_Matting         |  4   | 1.0368  |  6.4563   |      nan       |   42.5834   | 98.0044  |
|             resnet50              |  32  | 1.0195  |  6.7185   |      nan       |   41.8437   | 90.5664  |
|             resnet18              |  16  | 0.4796  |   2.575   |      nan       |   23.3127   | 88.3143  |
|       functorch_dp_cifar10        |  64  | 0.3993  |  2.5417   |      nan       |   6.4062    | 86.9469  |
|             hf_Albert             |  8   | 1.3041  |   8.14    |      nan       |     nan     | 85.9413  |
|            hf_Reformer            |  4   | 2.5032  |  5.3906   |    13.8552     |     nan     | 78.5392  |
|              hf_Bert              |  4   | 1.6042  |  8.8371   |      nan       |     nan     | 78.0438  |
|           pytorch_unet            |  1   | 0.5186  |  2.9036   |      nan       |   26.2702   | 70.4316  |
|            hf_BigBird             |  2   | 8.0991  |  16.5773  |      nan       |     nan     | 69.7388  |
|          LearningToPaint          |  96  | 0.5087  |  2.6689   |      nan       |   30.5027   | 67.9579  |
|           hf_DistilBert           |  8   | 0.6096  |  4.1638   |      nan       |     nan     |  54.722  |
|           squeezenet1_1           |  32  |  0.264  |  1.4091   |     6.414      |   6.5055    | 46.4889  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.4526  |  2.8756   |     11.943     |   4.8348    | 42.8301  |
|               vgg16               |  64  | 0.1981  |  0.9476   |     4.0166     |   3.5663    | 35.4152  |
|              alexnet              | 128  | 0.1666  |  0.6006   |     1.9004     |   3.1682    | 28.0079  |
|                drq                |  1   | 0.1686  |  0.6502   |      nan       |   4.3912    | 25.5532  |
|               dcgan               |  32  | 0.1826  |  0.5737   |     1.8794     |    4.284    | 20.5771  |
|      nvidia_deeprecommender       | 256  | 0.2133  |  0.6182   |     0.9681     |   2.9238    | 16.3507  |
|         soft_actor_critic         | 256  | 0.2111  |  0.4295   |     0.739      |    2.043    | 12.9604  |
|           lennard_jones           | 1000 | 0.1575  |  0.4393   |     0.6231     |   1.5084    |  8.2622  |
|            tts_angular            |  64  | 0.2296  |  0.2948   |     0.4243     |   1.0492    |  4.2834  |
|              demucs               |  4   | 0.3536  |  0.3523   |     0.3482     |   0.3553    |  0.2593  |
|           hf_GPT2_large           |  4   | 5.4574  |  24.7197  |      nan       |     nan     |   nan    |
|        speech_transformer         |  32  | 1.9382  |  11.4142  |      nan       |     nan     |   nan    |
|               dlrm                | 2048 |  0.48   |  1.0459   |      nan       |     nan     |   nan    |
|           hf_Longformer           |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
|             tacotron2             |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+
|         timm_efficientnet         |  32  | 0.988  |  0.7698   |      nan       |   0.7887    |  1.2758  |
|             hf_Albert             |  8   | 0.9814 |   0.936   |      nan       |     nan     |  1.1576  |
|            Super_SloMo            |  6   | 1.0024 |  0.9645   |      nan       |     nan     |  1.0536  |
|            timm_nfnet             | 128  | 0.9693 |  0.8982   |      nan       |   0.9445    |  1.0337  |
|         timm_efficientdet         |  1   | 1.028  |  0.8404   |      nan       |     nan     |  1.0226  |
|           mobilenet_v2            |  96  | 0.9857 |  0.7639   |      nan       |   0.9117    |  1.0074  |
|            tts_angular            |  64  | 1.0002 |  1.0002   |     0.9853     |   1.0002    |  0.9895  |
|              demucs               |  4   | 0.9872 |  0.9872   |     0.9872     |   0.9872    |  0.9872  |
| attention_is_all_you_need_pytorch | 256  | 0.9979 |   0.94    |      nan       |     nan     |  0.9829  |
|           BERT_pytorch            |  16  |  1.0   |  0.8825   |      nan       |     nan     |  0.9721  |
|              hf_GPT2              |  4   | 0.9706 |  0.8625   |      nan       |     nan     |  0.9648  |
|               hf_T5               |  8   | 0.9678 |  0.9371   |      nan       |     nan     |  0.9309  |
|            timm_regnet            |  32  | 0.9953 |  0.8446   |      nan       |    0.85     |  0.9249  |
|        Background_Matting         |  4   | 1.0138 |  0.9624   |      nan       |   0.9813    |  0.9245  |
|              yolov3               |  16  | 0.9908 |  0.8381   |      nan       |   0.8244    |  0.9059  |
|              hf_Bert              |  4   | 0.9844 |  0.8677   |      nan       |     nan     |  0.9017  |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.999  |  0.8735   |     0.2638     |   0.8441    |  0.8861  |
|   timm_vision_transformer_large   |  8   | 0.9973 |  0.8357   |      nan       |   0.8494    |  0.879   |
|           timm_resnest            |  32  | 0.9868 |  0.8711   |      nan       |   0.8623    |  0.8756  |
|            densenet121            |  4   | 0.9857 |  0.8678   |      nan       |   0.8376    |  0.8753  |
|           pytorch_unet            |  1   | 0.9968 |  0.8653   |      nan       |   0.8496    |  0.8678  |
|           fastNLP_Bert            |  6   | 1.0012 |  0.8966   |      nan       |     nan     |  0.8661  |
|             resnet50              |  32  | 0.9907 |  0.8629   |      nan       |   0.7995    |  0.8652  |
|           squeezenet1_1           |  32  | 0.9604 |  0.7958   |     0.2916     |   0.7589    |  0.8611  |
|        shufflenet_v2_x1_0         | 128  | 0.956  |  0.8401   |      nan       |   0.8503    |  0.856   |
|            hf_T5_large            |  2   | 0.8541 |  0.8541   |      nan       |     nan     |  0.8541  |
|           hf_DistilBert           |  8   | 0.9505 |  0.8806   |      nan       |     nan     |  0.8387  |
|            timm_vovnet            |  32  | 0.9903 |  0.7678   |      nan       |   0.7742    |  0.8352  |
|               dcgan               |  32  | 0.9698 |  0.7838   |     0.3394     |   0.7073    |  0.8283  |
|              hf_Bart              |  4   | 0.9102 |  0.8321   |      nan       |     nan     |  0.8137  |
|            hf_BigBird             |  2   | 0.9837 |  0.9784   |      nan       |     nan     |  0.8098  |
|              alexnet              | 128  | 0.951  |  0.7753   |     0.4257     |   0.7753    |  0.7974  |
|        mobilenet_v3_large         |  32  | 0.9776 |  0.8499   |      nan       |    0.866    |  0.7918  |
|          pytorch_stargan          |  16  | 0.9929 |  0.9742   |     0.2147     |   0.8882    |  0.7783  |
|          resnext50_32x4d          |  8   | 0.9932 |  0.8549   |      nan       |   0.8176    |  0.7644  |
|            mnasnet1_0             |  32  | 0.9785 |  0.8621   |     0.1723     |   0.8207    |  0.7541  |
|                drq                |  1   | 0.9877 |  0.8312   |      nan       |   0.8308    |  0.752   |
|               vgg16               |  64  | 0.9924 |  0.7339   |     0.2971     |   0.7172    |  0.7491  |
|         soft_actor_critic         | 256  | 0.9998 |  0.9149   |     0.4736     |   0.9149    |  0.7295  |
|          LearningToPaint          |  96  | 0.9252 |  0.7196   |      nan       |   0.6722    |  0.7295  |
|      timm_vision_transformer      |  8   | 0.9952 |  0.8826   |      nan       |   0.8871    |  0.7151  |
|             resnet18              |  16  | 0.9779 |  0.7727   |      nan       |   0.7276    |  0.6102  |
|           lennard_jones           | 1000 | 0.9995 |  0.9997   |     0.3734     |   1.0967    |  0.564   |
|      nvidia_deeprecommender       | 256  | 0.5596 |  0.5596   |     0.5121     |   0.5596    |  0.5596  |
|       functorch_dp_cifar10        |  64  | 0.9964 |  0.8107   |      nan       |   0.8452    |  0.4478  |
|          pytorch_struct           | 200  |  1.0   |  0.5081   |     0.4858     |   0.5082    |  0.4235  |
|            hf_Reformer            |  4   | 0.3764 |  0.9847   |     0.2529     |     nan     |  0.3629  |
|        speech_transformer         |  32  | 1.0017 |  0.9174   |      nan       |     nan     |   nan    |
|           hf_GPT2_large           |  4   | 0.9582 |  0.8645   |      nan       |     nan     |   nan    |
|               dlrm                | 2048 | 0.7301 |  0.7306   |      nan       |     nan     |   nan    |
|           hf_Longformer           |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
|             tacotron2             |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|           ElectraForCausalLM            | 1  | 1.0382 |  0.8408   |      0.0       |     0.0     |  7.2622  |
|          MobileBertForMaskedLM          | 16 | 1.0136 |  0.8308   |      0.0       |     0.0     |  5.8275  |
|       MT5ForConditionalGeneration       | 2  | 1.0195 |  0.8543   |      0.0       |     0.0     |  5.6799  |
|     MobileBertForQuestionAnswering      | 32 | 1.018  |  0.8123   |      0.0       |     0.0     |  5.2174  |
|            YituTechConvBert             | 1  | 1.024  |  0.8352   |      0.0       |     0.0     |  5.0508  |
|         MegatronBertForCausalLM         | 2  | 1.0336 |  0.8383   |      0.0       |     0.0     |  4.828   |
|             OPTForCausalLM              | 4  | 1.0147 |  0.8186   |      0.0       |     0.0     |  4.3716  |
|           RobertaForCausalLM            | 4  | 1.0349 |  0.8428   |      0.0       |     0.0     |  4.0103  |
|     M2M100ForConditionalGeneration      | 2  | 1.0062 |  0.8521   |      0.0       |     0.0     |  3.8744  |
|                CamemBert                | 1  | 1.0425 |  0.8424   |      0.0       |     0.0     |   3.54   |
|     PegasusForConditionalGeneration     | 4  | 1.0079 |  0.8177   |      0.0       |     0.0     |  3.2337  |
|             XGLMForCausalLM             | 1  | 1.0148 |  0.8016   |      0.0       |     0.0     |  3.1114  |
|               DistillGPT2               | 1  | 1.0312 |  0.8664   |      0.0       |     0.0     |  2.9666  |
|     PLBartForConditionalGeneration      | 8  | 1.0149 |  0.8248   |      0.0       |     0.0     |  2.8226  |
|    MegatronBertForQuestionAnswering     | 8  | 1.0341 |  0.8481   |      0.0       |     0.0     |  2.702   |
|      MBartForConditionalGeneration      | 8  | 1.0145 |   0.826   |      0.0       |     0.0     |  2.3644  |
|         Speech2Text2ForCausalLM         | 64 | 1.0102 |  0.8122   |      0.0       |     0.0     |  2.3247  |
|          DistilBertForMaskedLM          | 16 | 1.029  |  0.8565   |      0.0       |     0.0     |  2.171   |
|      GPT2ForSequenceClassification      | 4  | 1.0014 |  0.9751   |      0.0       |     0.0     |  2.1475  |
|       ElectraForQuestionAnswering       | 64 | 1.0003 |  0.9711   |      0.0       |     0.0     |  1.953   |
|            TrOCRForCausalLM             | 8  | 1.0151 |  0.8234   |      0.0       |     0.0     |  1.925   |
|     DistilBertForQuestionAnswering      | 32 | 1.0287 |   0.841   |      0.0       |     0.0     |  1.857   |
|           PegasusForCausalLM            | 8  | 1.0082 |  0.7985   |      0.0       |     0.0     |  1.8519  |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0144 |  0.8904   |      0.0       |     0.0     |  1.8108  |
|      BartForConditionalGeneration       | 1  | 1.0155 |  0.8249   |      0.0       |     0.0     |  1.7844  |
|    LayoutLMForSequenceClassification    | 16 | 1.0004 |  0.9795   |      0.0       |     0.0     |  1.7373  |
|       AlbertForQuestionAnswering        | 2  | 1.0005 |  0.8085   |      0.0       |     0.0     |  1.6598  |
|            PLBartForCausalLM            | 16 | 1.0139 |  0.9442   |      0.0       |     0.0     |  1.6508  |
|            AlbertForMaskedLM            | 2  | 1.0004 |  0.8105   |      0.0       |     0.0     |  1.6425  |
|                 T5Small                 | 1  | 1.0266 |  0.8772   |      0.0       |     0.0     |  1.6369  |
|       T5ForConditionalGeneration        | 4  | 0.9987 |  0.9338   |      0.0       |     0.0     |  1.6149  |
|            XLNetLMHeadModel             | 4  | 0.9998 |  0.9633   |      0.0       |     0.0     |  1.5997  |
|           LayoutLMForMaskedLM           | 16 | 1.0002 |  0.9711   |      0.0       |     0.0     |  1.5866  |
|            MBartForCausalLM             | 16 | 1.0116 |  0.8169   |      0.0       |     0.0     |  1.5186  |
|             BartForCausalLM             | 2  | 1.0024 |  0.9638   |      0.0       |     0.0     |  1.4734  |
|       DebertaForQuestionAnswering       | 4  | 0.9316 |  0.7278   |     0.9356     |     0.0     |  1.4577  |
|       RobertaForQuestionAnswering       | 64 | 1.0004 |  0.9687   |      0.0       |     0.0     |  1.4464  |
|        BertForQuestionAnswering         | 64 | 1.0008 |  0.9696   |      0.0       |     0.0     |  1.4345  |
|             BertForMaskedLM             | 64 |  1.0   |  0.9584   |      0.0       |     0.0     |  1.3189  |
|       BlenderbotSmallForCausalLM        | 64 | 1.0021 |  0.9266   |      0.0       |     0.0     |  1.3083  |
|           DebertaForMaskedLM            | 4  | 0.9353 |  0.7501   |     0.7988     |     0.0     |  1.222   |
|                 BigBird                 | 1  | 0.9794 |   0.91    |      0.0       |     0.0     |  1.1384  |
|          AllenaiLongformerBase          | 0  |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Accuracy

+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+
|                  name                   | bs |    eager    |  aot_eager  | aot_cudagraphs | aot_nvfuser |  inductor   |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+
|            AlbertForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       AlbertForQuestionAnswering        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             BartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|      BartForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             BertForMaskedLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|        BertForQuestionAnswering         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                 BigBird                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                CamemBert                | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           DebertaForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|          DistilBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     DistilBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|               DistillGPT2               | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           ElectraForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       ElectraForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|      GPT2ForSequenceClassification      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           LayoutLMForMaskedLM           | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|    LayoutLMForSequenceClassification    | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     M2M100ForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            MBartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       MT5ForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|         MegatronBertForCausalLM         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|          MobileBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     MobileBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             OPTForCausalLM              | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            PLBartForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           PegasusForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|     PegasusForConditionalGeneration     | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|           RobertaForCausalLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       RobertaForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|         Speech2Text2ForCausalLM         | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       T5ForConditionalGeneration        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|                 T5Small                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            TrOCRForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|             XGLMForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            XLNetLMHeadModel             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|            YituTechConvBert             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |
|       DebertaForQuestionAnswering       | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |
|      MBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |
|     PLBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |
|          AllenaiLongformerBase          | 1  | fail_to_run | fail_to_run |  fail_to_run   | fail_to_run | fail_to_run |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+

Compilation latency (sec)

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|            XLNetLMHeadModel             | 4  | 3.8112 |  24.3247  |      nan       |     nan     | 316.035  |
|     M2M100ForConditionalGeneration      | 2  | 3.5068 |  20.8405  |      nan       |     nan     | 206.999  |
|       MT5ForConditionalGeneration       | 2  | 3.3885 |  15.9549  |      nan       |     nan     | 197.5786 |
|            YituTechConvBert             | 1  | 2.4628 |  13.4776  |      nan       |     nan     | 197.3318 |
|       T5ForConditionalGeneration        | 4  | 2.1304 |  10.8851  |      nan       |     nan     | 191.296  |
|          MobileBertForMaskedLM          | 16 | 9.0497 |  40.5762  |      nan       |     nan     | 188.2942 |
|     MobileBertForQuestionAnswering      | 32 | 9.1376 |  40.6477  |      nan       |     nan     | 170.5566 |
|           DebertaForMaskedLM            | 4  | 4.8092 |  12.4543  |    53.8282     |     nan     | 166.2857 |
|     PegasusForConditionalGeneration     | 4  | 3.3765 |  22.5318  |      nan       |     nan     | 164.7196 |
|             XGLMForCausalLM             | 1  | 2.7194 |  17.0177  |      nan       |     nan     | 163.8919 |
|                 T5Small                 | 1  | 2.1227 |  10.9165  |      nan       |     nan     | 160.2439 |
|      MBartForConditionalGeneration      | 8  | 3.4787 |  21.8606  |      nan       |     nan     | 158.4846 |
|      BartForConditionalGeneration       | 1  | 3.3831 |  21.2971  |      nan       |     nan     | 151.772  |
|         MegatronBertForCausalLM         | 2  | 3.4752 |  17.8153  |      nan       |     nan     | 149.5003 |
|    MegatronBertForQuestionAnswering     | 8  | 3.4793 |  17.5881  |      nan       |     nan     | 144.6857 |
|     PLBartForConditionalGeneration      | 8  | 1.7497 |  11.0365  |      nan       |     nan     | 143.4586 |
|       DebertaForQuestionAnswering       | 4  | 4.9493 |  12.5325  |     53.101     |     nan     | 130.9367 |
| BlenderbotSmallForConditionalGeneration | 32 | 2.1805 |  14.2719  |      nan       |     nan     | 129.4778 |
|           RobertaForCausalLM            | 4  | 1.6853 |  9.0665   |      nan       |     nan     | 105.0382 |
|    LayoutLMForSequenceClassification    | 16 | 1.7694 |  9.1761   |      nan       |     nan     | 99.2086  |
|           PegasusForCausalLM            | 8  | 1.3071 |  7.9972   |      nan       |     nan     | 93.7895  |
|       ElectraForQuestionAnswering       | 64 | 1.6498 |  8.9657   |      nan       |     nan     | 92.0329  |
|             OPTForCausalLM              | 4  | 1.3525 |  8.2425   |      nan       |     nan     | 90.2333  |
|           LayoutLMForMaskedLM           | 16 | 1.7927 |  9.2895   |      nan       |     nan     | 90.1629  |
|            MBartForCausalLM             | 16 | 1.2233 |  8.0907   |      nan       |     nan     |  89.171  |
|             BertForMaskedLM             | 64 | 1.5432 |  8.6537   |      nan       |     nan     | 87.2679  |
|      GPT2ForSequenceClassification      | 4  | 1.4535 |   7.926   |      nan       |     nan     | 84.9463  |
|             BartForCausalLM             | 2  | 1.2831 |   7.99    |      nan       |     nan     | 82.7793  |
|            AlbertForMaskedLM            | 2  | 1.5781 |  8.5461   |      nan       |     nan     | 77.6663  |
|           ElectraForCausalLM            | 1  | 1.667  |   8.973   |      nan       |     nan     | 76.0295  |
|     DistilBertForQuestionAnswering      | 32 | 0.6709 |   4.26    |      nan       |     nan     | 73.8278  |
|            TrOCRForCausalLM             | 8  | 1.277  |  8.0625   |      nan       |     nan     | 73.0386  |
|                 BigBird                 | 1  | 7.9688 |  16.8522  |      nan       |     nan     | 72.2757  |
|            PLBartForCausalLM            | 16 | 0.6355 |  4.1232   |      nan       |     nan     | 70.9095  |
|                CamemBert                | 1  | 1.6402 |  8.8517   |      nan       |     nan     | 69.8196  |
|       BlenderbotSmallForCausalLM        | 64 | 0.7938 |  5.6202   |      nan       |     nan     | 68.5483  |
|       RobertaForQuestionAnswering       | 64 | 1.5947 |  8.7628   |      nan       |     nan     | 68.5105  |
|         Speech2Text2ForCausalLM         | 64 | 0.6924 |  4.1727   |      nan       |     nan     |  67.245  |
|               DistillGPT2               | 1  | 0.7462 |  3.9659   |      nan       |     nan     | 65.2046  |
|        BertForQuestionAnswering         | 64 | 1.6715 |  8.6885   |      nan       |     nan     | 61.8158  |
|          DistilBertForMaskedLM          | 16 | 0.6112 |  4.2202   |      nan       |     nan     | 56.1078  |
|       AlbertForQuestionAnswering        | 2  | 1.4775 |   8.302   |      nan       |     nan     | 50.5431  |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+
|      GPT2ForSequenceClassification      | 4  | 0.9675 |  0.9164   |      nan       |     nan     |  1.0779  |
|             BartForCausalLM             | 2  |  1.0   |  0.8769   |      nan       |     nan     |  1.0442  |
|            XLNetLMHeadModel             | 4  | 0.9912 |  0.8791   |      nan       |     nan     |  1.0109  |
|    LayoutLMForSequenceClassification    | 16 | 1.004  |  0.9325   |      nan       |     nan     |  1.0056  |
|       T5ForConditionalGeneration        | 4  | 0.9996 |  0.9594   |      nan       |     nan     |  0.995   |
|       RobertaForQuestionAnswering       | 64 | 0.9996 |  0.9315   |      nan       |     nan     |  0.9946  |
|        BertForQuestionAnswering         | 64 | 0.9995 |  0.9315   |      nan       |     nan     |  0.9946  |
|       ElectraForQuestionAnswering       | 64 | 1.0016 |  0.9538   |      nan       |     nan     |  0.9938  |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8619   |      nan       |     nan     |  0.9894  |
|                 T5Small                 | 1  |  1.0   |  0.9124   |      nan       |     nan     |  0.9874  |
|           LayoutLMForMaskedLM           | 16 | 0.9999 |  0.9238   |      nan       |     nan     |  0.9871  |
|             BertForMaskedLM             | 64 | 0.9996 |   0.899   |      nan       |     nan     |  0.9811  |
|            MBartForCausalLM             | 16 |  1.0   |  0.8398   |      nan       |     nan     |  0.9567  |
| BlenderbotSmallForConditionalGeneration | 32 | 0.9998 |  0.8996   |      nan       |     nan     |  0.9557  |
|         Speech2Text2ForCausalLM         | 64 | 0.969  |  0.8488   |      nan       |     nan     |  0.9452  |
|            PLBartForCausalLM            | 16 | 1.0001 |  0.8666   |      nan       |     nan     |  0.9395  |
|       BlenderbotSmallForCausalLM        | 64 | 0.9996 |  0.8172   |      nan       |     nan     |  0.9269  |
|          DistilBertForMaskedLM          | 16 | 0.9986 |  0.8686   |      nan       |     nan     |  0.9164  |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.6451   |      nan       |     nan     |  0.9124  |
|            AlbertForMaskedLM            | 2  |  1.0   |  0.6364   |      nan       |     nan     |  0.8977  |
|      MBartForConditionalGeneration      | 8  | 0.9999 |  0.8187   |      nan       |     nan     |  0.8861  |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.7955   |      nan       |     nan     |  0.8774  |
|                CamemBert                | 1  | 0.9989 |  0.7872   |      nan       |     nan     |  0.8654  |
|     DistilBertForQuestionAnswering      | 32 | 0.9992 |  0.8965   |      nan       |     nan     |  0.8639  |
|            YituTechConvBert             | 1  | 0.9718 |  0.7819   |      nan       |     nan     |  0.8618  |
|           RobertaForCausalLM            | 4  | 0.9237 |  0.7741   |      nan       |     nan     |  0.8574  |
|             OPTForCausalLM              | 4  | 0.9974 |   0.75    |      nan       |     nan     |  0.8483  |
|           PegasusForCausalLM            | 8  | 0.999  |  0.9444   |      nan       |     nan     |  0.8445  |
|     PLBartForConditionalGeneration      | 8  | 0.9975 |  0.8294   |      nan       |     nan     |  0.8438  |
|    MegatronBertForQuestionAnswering     | 8  | 0.9051 |  0.8218   |      nan       |     nan     |  0.8434  |
|                 BigBird                 | 1  | 1.0008 |  0.9533   |      nan       |     nan     |  0.8348  |
|               DistillGPT2               | 1  | 0.9963 |  0.7527   |      nan       |     nan     |  0.8288  |
|             XGLMForCausalLM             | 1  |  1.0   |   0.999   |      nan       |     nan     |  0.7913  |
|         MegatronBertForCausalLM         | 2  | 0.7726 |  0.7726   |      nan       |     nan     |  0.7726  |
|     PegasusForConditionalGeneration     | 4  | 0.9994 |  0.9194   |      nan       |     nan     |  0.7686  |
|     M2M100ForConditionalGeneration      | 2  |  1.0   |  0.9585   |      nan       |     nan     |  0.7175  |
|          MobileBertForMaskedLM          | 16 | 0.9985 |  0.8983   |      nan       |     nan     |  0.6948  |
|           ElectraForCausalLM            | 1  | 0.9993 |  0.8955   |      nan       |     nan     |  0.6701  |
|     MobileBertForQuestionAnswering      | 32 | 1.0142 |  0.9796   |      nan       |     nan     |  0.6265  |
|       MT5ForConditionalGeneration       | 2  | 0.6019 |  0.6019   |      nan       |     nan     |  0.6019  |
|           DebertaForMaskedLM            | 4  | 0.9982 |  0.9824   |     0.3598     |     nan     |  0.4498  |
|       DebertaForQuestionAnswering       | 4  | 0.9792 |  1.0574   |     0.3577     |     nan     |  0.3761  |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|        res2net50_14w_8s         |  2  | 1.0025 |  0.8989   |      0.0       |   1.3928    |  5.4582  |
|           res2next50            |  2  | 1.0036 |  0.9285   |      0.0       |   1.3683    |  5.2409  |
|            hrnet_w18            |  2  | 1.0062 |   0.957   |      0.0       |   1.3783    |  5.0807  |
|        twins_pcpvt_base         | 32  | 1.0041 |  0.8853   |      0.0       |   1.2699    |  2.615   |
|           resnest101e           | 32  | 1.0044 |  0.9809   |      0.0       |   1.4177    |  2.3441  |
|           regnety_002           | 128 | 0.9787 |  0.9323   |      0.0       |   1.3798    |  2.1367  |
|          cait_m36_384           |  2  | 0.9999 |  0.8454   |      0.0       |   1.3838    |  2.0929  |
|      xcit_large_24_p8_224       |  5  | 0.9988 |    0.0    |      0.0       |     0.0     |  2.0773  |
|        tnt_s_patch16_224        | 64  | 1.0001 |   0.996   |      0.0       |   1.8886    |  2.0751  |
|          ghostnet_100           | 128 | 1.0033 |   0.999   |      0.0       |   1.5476    |  2.0386  |
|            lcnet_050            | 128 | 0.9678 |  0.9549   |      0.0       |   1.5598    |  2.0212  |
|            nfnet_l0             | 64  | 1.0076 |  0.8344   |      0.0       |   1.1376    |  1.7831  |
|          gmixer_24_224          | 64  | 0.9996 |   0.884   |     0.6424     |   1.0029    |  1.6677  |
|           mobilevit_s           | 32  | 0.9755 |  0.7952   |      0.0       |   1.2119    |  1.656   |
|             dla102              | 64  |  1.0   |  0.9914   |      0.0       |   1.3805    |  1.6115  |
|           volo_d1_224           | 64  | 0.9996 |  0.9946   |      0.0       |   1.1413    |  1.6004  |
|         crossvit_9_240          | 64  | 1.006  |  0.9545   |      0.0       |   1.1212    |  1.5835  |
|        res2net101_26w_4s        | 64  | 1.0007 |  0.9965   |      0.0       |   1.4297    |  1.5781  |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9571   |      0.0       |   1.0453    |  1.521   |
|       gluon_inception_v3        | 128 | 0.9999 |  0.9963   |      0.0       |   1.1948    |  1.5048  |
|        adv_inception_v3         | 128 | 0.9999 |  0.9966   |      0.0       |   1.1948    |  1.5042  |
|          inception_v3           | 128 | 0.9998 |  0.9963   |      0.0       |   1.1947    |  1.5019  |
|           dm_nfnet_f0           | 128 | 0.9989 |  0.9997   |      0.0       |   1.1778    |  1.4949  |
|      mobilenetv3_large_100      | 128 | 0.9547 |  0.9449   |      0.0       |   1.3737    |  1.4633  |
|          resmlp_12_224          | 128 |  1.0   |  0.9988   |     0.7769     |     0.0     |  1.4475  |
|           selecsls42b           | 128 | 0.9998 |  0.9961   |      0.0       |   1.3583    |   1.42   |
|            fbnetv3_b            | 128 | 0.9523 |  0.9493   |      0.0       |   1.2549    |  1.4082  |
|         coat_lite_mini          | 128 | 1.0002 |  0.9891   |      0.0       |   1.2187    |  1.4047  |
|           mnasnet_100           | 128 | 0.9535 |   0.944   |     0.6644     |   1.3691    |  1.4015  |
|          jx_nest_base           | 32  | 0.9998 |  0.9932   |      0.0       |   1.2246    |  1.3932  |
|          gmlp_s16_224           | 64  | 0.9996 |   0.984   |      0.0       |   1.0389    |  1.3825  |
|          pnasnet5large          | 16  | 1.006  |  1.0329   |      0.0       |   1.1815    |  1.3739  |
|         mobilenetv2_100         | 128 | 0.951  |  0.9414   |      0.0       |   0.8619    |  1.3718  |
|          spnasnet_100           | 128 | 0.9478 |   0.938   |     0.6462     |   1.3169    |  1.3674  |
|            pit_b_224            | 64  |  1.0   |   0.995   |      0.0       |   1.0618    |  1.3584  |
|        ese_vovnet19b_dw         | 128 | 0.9692 |  0.9649   |      0.0       |   1.2477    |  1.3565  |
|           convit_base           | 32  | 0.9999 |  0.9925   |      0.0       |     0.0     |  1.3442  |
|           fbnetc_100            | 128 | 0.9534 |  0.9432   |     0.6707     |   1.3751    |  1.3411  |
|       tf_efficientnet_b0        | 128 | 0.9654 |  0.8079   |      0.0       |   1.0911    |  1.3307  |
|          cspdarknet53           | 64  | 0.9421 |  0.9338   |      0.0       |   0.9014    |  1.3303  |
|         poolformer_m36          | 64  | 1.0001 |  0.9979   |      0.0       |     0.0     |  1.3282  |
|          botnet26t_256          | 128 | 0.9798 |  0.9749   |      0.0       |   1.3468    |  1.3078  |
|            tinynet_a            | 128 | 0.9599 |  0.7899   |      0.0       |   1.1514    |  1.3006  |
|      beit_base_patch16_224      | 64  |  1.0   |  0.9783   |      0.0       |   1.0434    |  1.2865  |
| deit_base_distilled_patch16_224 | 64  |  1.0   |  0.9908   |      0.0       |   1.0625    |  1.2819  |
|           rexnet_100            | 128 | 0.9644 |  0.8507   |      0.0       |   1.0373    |  1.276   |
|            mixnet_l             | 64  | 0.9803 |  0.8889   |      0.0       |   1.0862    |  1.2668  |
|       eca_botnext26ts_256       | 64  | 0.9612 |  0.8004   |      0.0       |   1.1083    |  1.247   |
|          mixer_b16_224          | 64  | 0.9999 |   0.991   |     0.7133     |   0.9569    |  1.2469  |
|         visformer_small         | 128 | 0.9996 |   1.002   |      0.0       |   1.0833    |  1.2395  |
|           tf_mixnet_l           | 64  | 0.9828 |  0.8978   |      0.0       |   1.0668    |  1.2336  |
|        sebotnet33ts_256         | 64  | 0.9665 |   0.836   |      0.0       |   1.1165    |  1.2145  |
|      vit_base_patch16_224       | 64  | 0.9999 |  0.9938   |      0.0       |   0.9929    |  1.1942  |
|             dpn107              | 32  | 0.9383 |  0.9315   |      0.0       |   0.9919    |  1.1809  |
|        gluon_xception65         | 32  | 0.9996 |  0.9895   |      0.0       |   1.0644    |  1.1611  |
|            repvgg_a2            | 128 | 0.9435 |  0.9342   |     0.6562     |   1.1307    |  1.1366  |
|     swsl_resnext101_32x16d      | 32  | 0.9997 |  0.9824   |      0.0       |    1.076    |  1.1315  |
|            gernet_l             | 128 | 0.9461 |  0.9388   |      0.0       |   1.1424    |  1.0671  |
|        convmixer_768_32         | 32  | 0.9999 |  0.9982   |      0.0       |   1.0532    |  1.0557  |
|          convnext_base          | 32  | 1.0099 |  0.9298   |      0.0       |   1.2137    |  0.7229  |
|        eca_halonext26ts         | 64  | 0.9638 |   0.806   |      0.0       |   1.0966    |   0.0    |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |
|        adv_inception_v3         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          botnet26t_256          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        convmixer_768_32         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          cspdarknet53           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dla102              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|             dpn107              | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            gernet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          ghostnet_100           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       gluon_inception_v3        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            hrnet_w18            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          inception_v3           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            lcnet_050            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            mixnet_l             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         mobilenetv2_100         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           mobilevit_s           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            nfnet_l0             | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          pnasnet5large          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           regnety_002           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net101_26w_4s        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        res2net50_14w_8s         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           res2next50            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           rexnet_100            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           selecsls42b           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           tf_mixnet_l           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|            tinynet_a            | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|         visformer_small         | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|      vit_base_patch16_224       | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |
|      xcit_large_24_p8_224       | 2  | pass  |  fail_to_run  |  fail_to_run   |  fail_to_run  |     pass      |
|          gmixer_24_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|          gmlp_s16_224           | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |
|         poolformer_m36          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|           resnest101e           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|         coat_lite_mini          | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|            pit_b_224            | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |
|        eca_halonext26ts         | 2  | pass  |     pass      |  fail_to_run   |     pass      |  fail_to_run  |
|        gluon_xception65         | 2  | pass  |     pass      |  fail_to_run   |     pass      | fail_accuracy |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |
|            fbnetv3_b            | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy | fail_accuracy |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|        twins_pcpvt_base         | 32  | 2.8843 |  19.2931  |      nan       |   72.4687   | 871.1403 |
|         coat_lite_mini          | 128 | 1.1607 |  6.8772   |      nan       |   33.2129   | 807.967  |
|           mobilevit_s           | 32  | 2.0558 |  9.7982   |      nan       |   58.1758   | 721.9393 |
|       eca_botnext26ts_256       | 64  | 1.591  |  6.1393   |      nan       |   63.5595   | 558.2453 |
|  swin_base_patch4_window7_224   | 64  | 3.0716 |  16.3405  |      nan       |   72.1749   | 499.9333 |
|          convnext_base          | 32  | 1.6706 |  8.9108   |      nan       |   35.867    | 483.2362 |
|        sebotnet33ts_256         | 64  | 1.8986 |  8.6752   |      nan       |   68.6287   | 462.4035 |
|          botnet26t_256          | 128 | 1.4732 |  5.7738   |      nan       |   50.2327   | 409.7373 |
|           rexnet_100            | 128 | 2.2029 |  10.3698  |      nan       |  117.4054   | 387.7299 |
|          jx_nest_base           | 32  | 1.8489 |  11.7585  |      nan       |   50.5117   | 383.5946 |
|            mixnet_l             | 64  | 5.7531 |  16.3643  |      nan       |   81.6749   | 354.5396 |
|      xcit_large_24_p8_224       |  5  | 3.3798 |    nan    |      nan       |     nan     | 354.253  |
|          ghostnet_100           | 128 | 3.3561 |  13.1798  |      nan       |   91.6956   | 340.3388 |
|           resnest101e           | 32  | 3.7222 |  23.4942  |      nan       |  101.1366   | 327.116  |
|             dpn107              | 32  | 4.2898 |  18.5481  |      nan       |  102.7943   | 325.4081 |
|           tf_mixnet_l           | 64  | 6.0093 |  15.9106  |      nan       |    81.25    | 318.9435 |
|          pnasnet5large          | 16  | 5.3468 |  31.619   |      nan       |  194.3739   | 316.4984 |
|         crossvit_9_240          | 64  | 1.8205 |  10.925   |      nan       |   36.5965   | 301.2294 |
|            hrnet_w18            |  2  | 7.2856 |  44.0062  |      nan       |  370.9935   | 290.2174 |
|          cait_m36_384           |  2  | 3.4372 |  25.239   |      nan       |   61.2635   | 280.0794 |
|            fbnetv3_b            | 128 | 3.5101 |  14.9782  |      nan       |  101.1606   | 274.5077 |
|         visformer_small         | 128 | 1.0126 |  5.1658   |      nan       |   31.0174   | 249.6162 |
|           volo_d1_224           | 64  | 1.4324 |  9.8537   |      nan       |   39.6369   | 247.3936 |
|          inception_v3           | 128 | 2.0488 |  13.4329  |      nan       |   99.4931   | 234.6812 |
|       gluon_inception_v3        | 128 | 2.0003 |  12.5288  |      nan       |   99.4546   | 234.475  |
|            tinynet_a            | 128 | 2.3578 |  10.7604  |      nan       |   80.8993   | 233.3892 |
|        adv_inception_v3         | 128 | 2.1544 |  12.5395  |      nan       |   99.0938   | 233.3628 |
|       tf_efficientnet_b0        | 128 | 2.1054 |  8.9852   |      nan       |   78.7914   | 232.4157 |
|             dla102              | 64  | 2.1102 |  13.6918  |      nan       |   87.9253   | 226.2379 |
|            pit_b_224            | 64  | 1.1537 |  6.7464   |      nan       |   24.7865   | 223.8173 |
|      mobilenetv3_large_100      | 128 | 1.8808 |   7.692   |      nan       |   84.2177   | 213.0799 |
|        res2net50_14w_8s         |  2  | 3.3958 |  20.9949  |      nan       |  104.6199   | 213.0645 |
|           convit_base           | 32  | 1.2775 |   7.936   |      nan       |     nan     | 212.6967 |
|        res2net101_26w_4s        | 64  | 3.5635 |  23.5875  |      nan       |  122.3856   | 202.6938 |
|           fbnetc_100            | 128 | 2.3018 |  9.7552   |    82.8618     |   61.6258   | 188.447  |
|          spnasnet_100           | 128 | 2.3247 |  9.0716   |    94.9656     |   57.8461   | 186.491  |
|         poolformer_m36          | 64  | 1.984  |  11.108   |      nan       |     nan     | 185.8602 |
|        tnt_s_patch16_224        | 64  | 2.0138 |  13.818   |      nan       |   38.6371   | 178.7083 |
|          gmlp_s16_224           | 64  | 1.3743 |   9.499   |      nan       |   22.3145   | 169.6309 |
|           mnasnet_100           | 128 | 1.9848 |  7.3755   |    59.8699     |   52.113    | 166.9227 |
|          cspdarknet53           | 64  | 2.6554 |  10.1594  |      nan       |   41.6697   | 166.3958 |
|           res2next50            |  2  | 1.9943 |  11.9601  |      nan       |   59.6704   | 164.9944 |
|         mobilenetv2_100         | 128 | 1.9073 |  7.6148   |      nan       |   41.6747   | 158.2622 |
|           selecsls42b           | 128 | 0.8859 |  5.4934   |      nan       |   51.9674   | 152.6904 |
|           regnety_002           | 128 | 1.8571 |  7.9661   |      nan       |   56.8403   | 148.3253 |
|        gluon_xception65         | 32  | 2.406  |   15.51   |      nan       |   64.703    | 146.3288 |
|          gmixer_24_224          | 64  | 1.5741 |  10.4668  |    55.1733     |   27.6026   | 142.3327 |
|           dm_nfnet_f0           | 128 | 2.1876 |  8.9314   |      nan       |   38.4893   | 130.2856 |
|        ese_vovnet19b_dw         | 128 | 1.1165 |  4.2735   |      nan       |   39.1374   | 129.6364 |
|            nfnet_l0             | 64  | 1.8907 |  9.0197   |      nan       |   34.6809   | 123.1067 |
|          resmlp_12_224          | 128 | 0.6251 |   4.279   |     8.2348     |     nan     | 119.7689 |
|            gernet_l             | 128 | 2.2817 |  8.4993   |      nan       |   45.043    | 119.3916 |
|            lcnet_050            | 128 | 1.1419 |  4.5993   |      nan       |   39.1844   | 116.7487 |
|          mixer_b16_224          | 64  | 0.7133 |  4.7267   |     14.377     |   16.1639   | 115.9064 |
|            repvgg_a2            | 128 | 2.2565 |  8.1301   |    52.0702     |   63.7094   | 113.6453 |
| deit_base_distilled_patch16_224 | 64  | 0.9625 |  6.1138   |      nan       |   14.4419   | 110.7006 |
|     swsl_resnext101_32x16d      | 32  | 2.2174 |  13.4744  |      nan       |   53.9575   | 104.2414 |
|      beit_base_patch16_224      | 64  | 1.302  |  7.2463   |      nan       |   18.6805   | 102.7557 |
|      vit_base_patch16_224       | 64  | 0.9171 |  6.0165   |      nan       |   14.1213   | 79.5957  |
|        convmixer_768_32         | 32  | 1.4656 |   9.238   |      nan       |   18.6081   | 47.1118  |
|        eca_halonext26ts         | 64  | 1.5518 |  6.5765   |      nan       |   67.2819   |   nan    |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+
|            tinynet_a            | 128 | 0.9889 |  0.7884   |      nan       |   0.7887    |  1.3707  |
|          gmixer_24_224          | 64  | 0.9922 |  0.9494   |     0.2212     |   0.8991    |  1.2577  |
|          gmlp_s16_224           | 64  | 0.9939 |  0.9623   |      nan       |    0.92     |  1.2405  |
|       tf_efficientnet_b0        | 128 | 0.9882 |  0.7693   |      nan       |   0.8392    |  1.173   |
|          pnasnet5large          | 16  | 1.0575 |  0.9913   |      nan       |   1.1722    |  1.1609  |
|           rexnet_100            | 128 | 0.9885 |   0.785   |      nan       |   0.8648    |  1.1475  |
|           mobilevit_s           | 32  | 0.9926 |  0.7681   |      nan       |    0.787    |  1.1122  |
|       eca_botnext26ts_256       | 64  | 0.9888 |  0.7708   |      nan       |   0.7788    |  1.1081  |
|         poolformer_m36          | 64  | 0.9979 |  0.9432   |      nan       |     nan     |  1.1021  |
|             dla102              | 64  | 0.9931 |  0.9487   |      nan       |   0.9751    |  1.079   |
|        tnt_s_patch16_224        | 64  | 0.9948 |  0.9668   |      nan       |   0.9431    |  1.0469  |
|           dm_nfnet_f0           | 128 | 0.969  |   0.898   |      nan       |   0.9443    |  1.0336  |
|           resnest101e           | 32  | 0.9955 |  0.9721   |      nan       |   0.9532    |  1.0272  |
|           convit_base           | 32  | 0.9972 |  0.8582   |      nan       |     nan     |  1.0248  |
|           volo_d1_224           | 64  | 0.9965 |  0.9475   |      nan       |   0.8587    |  1.0138  |
|         mobilenetv2_100         | 128 | 0.9863 |  0.7642   |      nan       |   0.9129    |  1.0048  |
|            nfnet_l0             | 64  | 0.9884 |  0.8166   |      nan       |   0.8207    |  1.0037  |
|      beit_base_patch16_224      | 64  | 0.9952 |  0.9327   |      nan       |   0.9298    |  1.0004  |
|          ghostnet_100           | 128 | 0.9756 |   0.87    |      nan       |   0.9026    |  0.9897  |
|        convmixer_768_32         | 32  | 0.9972 |  0.9788   |      nan       |   0.9714    |  0.9746  |
|            pit_b_224            | 64  | 0.999  |  0.8053   |      nan       |   0.8179    |  0.9746  |
|           selecsls42b           | 128 | 0.9789 |   0.876   |      nan       |   0.8772    |  0.9715  |
|            fbnetv3_b            | 128 | 0.9872 |  0.7836   |      nan       |    0.79     |  0.9645  |
|        ese_vovnet19b_dw         | 128 | 0.9858 |  0.8566   |      nan       |   0.9146    |  0.9605  |
|         visformer_small         | 128 | 0.9899 |  0.9259   |      nan       |   0.8884    |  0.9382  |
|        twins_pcpvt_base         | 32  | 0.9938 |  0.9046   |      nan       |   0.8007    |  0.9335  |
|           tf_mixnet_l           | 64  | 0.9903 |  0.8556   |      nan       |   0.8366    |  0.9291  |
|      xcit_large_24_p8_224       |  5  | 0.9975 |    nan    |      nan       |     nan     |  0.9289  |
|     swsl_resnext101_32x16d      | 32  | 0.9989 |   0.879   |      nan       |   0.8487    |  0.9112  |
|             dpn107              | 32  | 0.997  |  0.9097   |      nan       |   0.8814    |  0.9078  |
|          mixer_b16_224          | 64  | 0.9929 |  0.9361   |     0.2528     |   0.7726    |  0.8978  |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9151   |      nan       |   0.8524    |  0.8964  |
|          cait_m36_384           |  2  | 0.9993 |  0.8803   |      nan       |    0.903    |  0.8949  |
|      mobilenetv3_large_100      | 128 | 0.9772 |   0.84    |      nan       |   0.8641    |  0.8948  |
|        gluon_xception65         | 32  | 0.9955 |  0.8859   |      nan       |   0.8854    |  0.8924  |
|      vit_base_patch16_224       | 64  | 0.9955 |  0.9342   |      nan       |   0.8801    |  0.8916  |
| deit_base_distilled_patch16_224 | 64  | 0.9944 |  0.9332   |      nan       |   0.8794    |  0.8911  |
|          convnext_base          | 32  | 1.0034 |  0.9053   |      nan       |   0.7521    |  0.8848  |
|        adv_inception_v3         | 128 | 0.9824 |  0.8621   |      nan       |   0.8538    |  0.8845  |
|       gluon_inception_v3        | 128 | 0.9824 |  0.8621   |      nan       |   0.8538    |  0.8845  |
|          inception_v3           | 128 | 0.9824 |  0.8621   |      nan       |   0.8538    |  0.8845  |
|            mixnet_l             | 64  |  0.99  |  0.8439   |      nan       |   0.7742    |  0.8647  |
|            gernet_l             | 128 | 0.9794 |  0.8503   |      nan       |   0.8158    |  0.8621  |
|          spnasnet_100           | 128 | 0.9788 |  0.8801   |     0.1645     |   0.8371    |  0.8602  |
|          cspdarknet53           | 64  | 0.9913 |  0.8405   |      nan       |   0.7908    |  0.8512  |
|           mnasnet_100           | 128 | 0.9765 |  0.8701   |     0.1662     |   0.8252    |  0.8503  |
|          botnet26t_256          | 128 | 0.9849 |   0.864   |      nan       |   0.7708    |  0.8503  |
|           fbnetc_100            | 128 |  0.98  |  0.8491   |     0.162      |   0.7352    |  0.8387  |
|            hrnet_w18            |  2  | 0.9971 |  0.8333   |      nan       |   0.8355    |  0.8367  |
|            lcnet_050            | 128 | 0.9433 |  0.7566   |      nan       |   0.7559    |  0.8309  |
|           regnety_002           | 128 | 0.9504 |  0.7948   |      nan       |   0.7515    |  0.8245  |
|           res2next50            |  2  | 0.9976 |  0.8277   |      nan       |   0.8198    |  0.8231  |
|        res2net50_14w_8s         |  2  | 0.9968 |   0.824   |      nan       |   0.8169    |  0.8228  |
|          resmlp_12_224          | 128 | 0.9827 |  0.9508   |     0.2624     |     nan     |  0.8092  |
|         coat_lite_mini          | 128 | 1.0338 |  0.9202   |      nan       |   0.6593    |  0.7962  |
|         crossvit_9_240          | 64  | 0.9874 |  0.8698   |      nan       |   0.8854    |  0.7934  |
|            repvgg_a2            | 128 | 0.9767 |  0.7822   |     0.1439     |   0.6789    |  0.7903  |
|  swin_base_patch4_window7_224   | 64  | 0.9966 |  0.9203   |      nan       |   0.8451    |  0.7566  |
|        sebotnet33ts_256         | 64  | 0.9928 |  0.7073   |      nan       |   0.7354    |  0.7449  |
|          jx_nest_base           | 32  | 0.9983 |  0.8927   |      nan       |    0.86     |  0.6708  |
|        eca_halonext26ts         | 64  | 0.9885 |   0.775   |      nan       |   0.7792    |   nan    |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/torchbench_amp.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| inductor_no_cudagraphs | 84%, 47/56 | 91%, 40/44  | 95%, 58/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| inductor_no_cudagraphs |   1.16x    |    1.19x    |    1.23x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| inductor_no_cudagraphs |   57.71    |    46.53    |    79.81    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
| inductor_no_cudagraphs |   0.93x    |    0.94x    |    1.01x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+------------------------+
|               name                |  bs  | inductor_no_cudagraphs |
+-----------------------------------+------+------------------------+
|             hf_Albert             |  8   |         1.6654         |
|               hf_T5               |  8   |         1.5034         |
|           timm_resnest            |  32  |         1.4518         |
|            timm_nfnet             | 128  |         1.4214         |
|           mobilenet_v2            |  96  |         1.4114         |
|           BERT_pytorch            |  16  |         1.3949         |
|              hf_GPT2              |  4   |         1.3789         |
|           hf_GPT2_large           |  4   |         1.3631         |
|        shufflenet_v2_x1_0         | 128  |         1.3536         |
|         timm_efficientdet         |  1   |         1.3438         |
|           fastNLP_Bert            |  6   |         1.3427         |
|            hf_T5_large            |  2   |         1.2981         |
|        mobilenet_v3_large         |  32  |         1.2546         |
|            mnasnet1_0             |  32  |         1.2175         |
|      timm_vision_transformer      |  8   |         1.1952         |
|           squeezenet1_1           |  32  |         1.1897         |
|           pytorch_unet            |  1   |         1.1867         |
|             resnet50              |  32  |         1.1683         |
|               vgg16               |  64  |         1.1661         |
|              alexnet              | 128  |         1.1658         |
|            Super_SloMo            |  6   |         1.1649         |
|           hf_DistilBert           |  8   |         1.155          |
|          LearningToPaint          |  96  |         1.1542         |
|             resnet18              |  16  |         1.1464         |
|            densenet121            |  4   |         1.1455         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |         1.1455         |
|            hf_Reformer            |  4   |         1.1294         |
|          resnext50_32x4d          |  8   |         1.1177         |
|              hf_Bert              |  4   |         1.1118         |
|        Background_Matting         |  4   |         1.1066         |
|         timm_efficientnet         |  32  |         1.1036         |
|              hf_Bart              |  4   |         1.1016         |
|            timm_regnet            |  32  |         1.094          |
|       functorch_dp_cifar10        |  64  |         1.0896         |
|          pytorch_stargan          |  16  |         1.0889         |
|          pytorch_struct           | 200  |         1.0721         |
|              yolov3               |  16  |         1.0642         |
|   timm_vision_transformer_large   |  8   |         1.0381         |
|               dcgan               |  32  |         1.0299         |
| attention_is_all_you_need_pytorch | 256  |         1.0285         |
|            hf_BigBird             |  2   |         1.0105         |
|            timm_vovnet            |  32  |         1.0014         |
|            tts_angular            |  64  |         1.0006         |
|              demucs               |  4   |         1.0001         |
|                drq                |  1   |         0.9774         |
|      nvidia_deeprecommender       | 256  |         0.9641         |
|           lennard_jones           | 1000 |         0.854          |
|         soft_actor_critic         | 256  |         0.821          |
|      resnet50_quantized_qat       |  0   |          0.0           |
|               dlrm                |  0   |          0.0           |
|     detectron2_fcos_r_50_fpn      |  0   |          0.0           |
|             tacotron2             |  0   |          0.0           |
|           hf_Longformer           |  0   |          0.0           |
|        speech_transformer         |  0   |          0.0           |
|               moco                |  0   |          0.0           |
|    mobilenet_v2_quantized_qat     |  0   |          0.0           |
+-----------------------------------+------+------------------------+

Accuracy

+-----------------------------------+-----+------------------------+
|               name                | bs  | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------------+
|            hf_T5_large            |  2  |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  2  |    pass_due_to_skip    |
|           hf_GPT2_large           |  2  |    pass_due_to_skip    |
|           BERT_pytorch            |  2  |          pass          |
|        shufflenet_v2_x1_0         |  2  |          pass          |
|        mobilenet_v3_large         |  2  |          pass          |
|      nvidia_deeprecommender       |  2  |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |          pass          |
|          pytorch_stargan          | 16  |          pass          |
|          pytorch_struct           | 200 |          pass          |
|           pytorch_unet            |  2  |          pass          |
|             resnet18              |  2  |          pass          |
|             resnet50              |  2  |          pass          |
|          resnext50_32x4d          |  2  |          pass          |
|         soft_actor_critic         | 256 |          pass          |
|           mobilenet_v2            |  2  |          pass          |
|           squeezenet1_1           |  2  |          pass          |
|         timm_efficientdet         |  2  |          pass          |
|         timm_efficientnet         |  2  |          pass          |
|            timm_nfnet             |  2  |          pass          |
|            timm_regnet            |  2  |          pass          |
|           timm_resnest            |  2  |          pass          |
|      timm_vision_transformer      |  2  |          pass          |
|            timm_vovnet            |  2  |          pass          |
|            tts_angular            |  2  |          pass          |
|               vgg16               |  2  |          pass          |
|        Background_Matting         |  4  |          pass          |
|              yolov3               |  2  |          pass          |
|            mnasnet1_0             |  2  |          pass          |
|            densenet121            |  2  |          pass          |
|                drq                |  1  |          pass          |
|              demucs               |  4  |          pass          |
|           fastNLP_Bert            |  2  |          pass          |
|           lennard_jones           |  2  |          pass          |
|       functorch_dp_cifar10        |  2  |          pass          |
|             hf_Albert             |  2  |          pass          |
|              hf_Bart              |  2  |          pass          |
|               dcgan               |  2  |          pass          |
|              hf_Bert              |  2  |          pass          |
|            hf_BigBird             |  2  |          pass          |
|           hf_DistilBert           |  2  |          pass          |
|              hf_GPT2              |  2  |          pass          |
| attention_is_all_you_need_pytorch |  2  |          pass          |
|            hf_Reformer            |  2  |          pass          |
|              alexnet              |  2  |          pass          |
|            Super_SloMo            |  2  |          pass          |
|          LearningToPaint          |  2  |          pass          |
|               dlrm                |  2  |          pass          |
|        speech_transformer         |  2  |      fail_to_run       |
|             tacotron2             |  2  |      fail_to_run       |
|      resnet50_quantized_qat       |  2  |      fail_to_run       |
|           hf_Longformer           |  2  |      fail_to_run       |
|               moco                |  2  |      fail_to_run       |
|    mobilenet_v2_quantized_qat     |  2  |      fail_to_run       |
|               hf_T5               |  2  |     fail_accuracy      |
|            hf_T5_base             |  2  |     fail_accuracy      |
|     detectron2_fcos_r_50_fpn      |  0  |         0.0000         |
|          vision_maskrcnn          |  0  |         0.0000         |
+-----------------------------------+-----+------------------------+

Compilation latency (sec)

+-----------------------------------+------+------------------------+
|               name                |  bs  | inductor_no_cudagraphs |
+-----------------------------------+------+------------------------+
|         timm_efficientdet         |  1   |        468.7017        |
|              yolov3               |  16  |        412.4742        |
|            hf_T5_large            |  2   |        201.9938        |
| attention_is_all_you_need_pytorch | 256  |        139.2113        |
|           hf_GPT2_large           |  4   |        135.7083        |
|      timm_vision_transformer      |  8   |        132.5885        |
|           timm_resnest            |  32  |        130.3164        |
|          pytorch_stargan          |  16  |        109.0045        |
|   timm_vision_transformer_large   |  8   |        101.2999        |
|          pytorch_struct           | 200  |        96.8387         |
|           BERT_pytorch            |  16  |         91.374         |
|           fastNLP_Bert            |  6   |        62.5788         |
|              hf_GPT2              |  4   |        58.9256         |
|              hf_Bart              |  4   |        48.2588         |
|               hf_T5               |  8   |        44.4672         |
|            densenet121            |  4   |        43.6522         |
|             hf_Albert             |  8   |        41.0064         |
|        mobilenet_v3_large         |  32  |        30.7343         |
|            mnasnet1_0             |  32  |         29.903         |
|              hf_Bert              |  4   |        29.4868         |
|          resnext50_32x4d          |  8   |         29.068         |
|            hf_Reformer            |  4   |         28.857         |
|            timm_nfnet             | 128  |        27.7243         |
|       functorch_dp_cifar10        |  64  |        25.0023         |
|            hf_BigBird             |  2   |         24.174         |
|             resnet18              |  16  |         22.149         |
|            timm_regnet            |  32  |        19.9962         |
|         timm_efficientnet         |  32  |        18.8884         |
|           hf_DistilBert           |  8   |        17.2316         |
|        shufflenet_v2_x1_0         | 128  |        16.8719         |
|            Super_SloMo            |  6   |        15.9453         |
|        Background_Matting         |  4   |        15.7895         |
|           mobilenet_v2            |  96  |        15.5507         |
|            timm_vovnet            |  32  |        14.3515         |
|             resnet50              |  32  |        14.0671         |
|           pytorch_unet            |  1   |         7.6953         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |         7.6134         |
|          LearningToPaint          |  96  |         6.523          |
|           squeezenet1_1           |  32  |         3.2894         |
|      nvidia_deeprecommender       | 256  |         3.2179         |
|                drq                |  1   |         2.7713         |
|               vgg16               |  64  |         2.647          |
|              alexnet              | 128  |         2.1583         |
|         soft_actor_critic         | 256  |         2.1351         |
|               dcgan               |  32  |         2.1126         |
|           lennard_jones           | 1000 |         1.3283         |
|            tts_angular            |  64  |         1.1874         |
|              demucs               |  4   |         0.1992         |
|     detectron2_fcos_r_50_fpn      |  0   |          nan           |
|               dlrm                |  0   |          nan           |
|           hf_Longformer           |  0   |          nan           |
|    mobilenet_v2_quantized_qat     |  0   |          nan           |
|               moco                |  0   |          nan           |
|      resnet50_quantized_qat       |  0   |          nan           |
|        speech_transformer         |  0   |          nan           |
|             tacotron2             |  0   |          nan           |
+-----------------------------------+------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+------------------------+
|               name                |  bs  | inductor_no_cudagraphs |
+-----------------------------------+------+------------------------+
|         timm_efficientnet         |  32  |         1.3377         |
|             hf_Albert             |  8   |         1.1942         |
|            Super_SloMo            |  6   |         1.1913         |
|               hf_T5               |  8   |         1.1507         |
|         timm_efficientdet         |  1   |         1.1428         |
|           squeezenet1_1           |  32  |         1.1267         |
|           mobilenet_v2            |  96  |         1.1105         |
|              hf_Bart              |  4   |         1.0962         |
|           hf_GPT2_large           |  4   |         1.0941         |
|              hf_GPT2              |  4   |         1.0819         |
|           fastNLP_Bert            |  6   |         1.0755         |
|           BERT_pytorch            |  16  |         1.0689         |
|            timm_nfnet             | 128  |         1.0495         |
|            hf_BigBird             |  2   |         1.0404         |
|        shufflenet_v2_x1_0         | 128  |         1.0072         |
|         soft_actor_critic         | 256  |         0.9991         |
|           lennard_jones           | 1000 |         0.9989         |
|          pytorch_stargan          |  16  |         0.9928         |
|              demucs               |  4   |         0.9886         |
|            tts_angular            |  64  |         0.9884         |
|            hf_Reformer            |  4   |         0.9882         |
|   timm_vision_transformer_large   |  8   |         0.9823         |
|           timm_resnest            |  32  |         0.9688         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |         0.9646         |
| attention_is_all_you_need_pytorch | 256  |         0.9432         |
|            timm_regnet            |  32  |         0.9323         |
|            densenet121            |  4   |         0.9307         |
|              yolov3               |  16  |         0.9271         |
|        Background_Matting         |  4   |         0.9164         |
|              hf_Bert              |  4   |         0.9017         |
|        mobilenet_v3_large         |  32  |         0.8964         |
|             resnet50              |  32  |         0.8913         |
|                drq                |  1   |         0.8778         |
|            mnasnet1_0             |  32  |         0.8659         |
|           pytorch_unet            |  1   |         0.8608         |
|           hf_DistilBert           |  8   |         0.8605         |
|          resnext50_32x4d          |  8   |         0.8352         |
|              alexnet              | 128  |         0.8332         |
|            timm_vovnet            |  32  |         0.8316         |
|            hf_T5_large            |  2   |         0.796          |
|               dcgan               |  32  |         0.7903         |
|      timm_vision_transformer      |  8   |         0.7779         |
|          LearningToPaint          |  96  |         0.7462         |
|             resnet18              |  16  |         0.7049         |
|               vgg16               |  64  |         0.6497         |
|      nvidia_deeprecommender       | 256  |         0.5598         |
|          pytorch_struct           | 200  |         0.429          |
|       functorch_dp_cifar10        |  64  |         0.4212         |
|     detectron2_fcos_r_50_fpn      |  0   |          nan           |
|               dlrm                |  0   |          nan           |
|           hf_Longformer           |  0   |          nan           |
|    mobilenet_v2_quantized_qat     |  0   |          nan           |
|               moco                |  0   |          nan           |
|      resnet50_quantized_qat       |  0   |          nan           |
|        speech_transformer         |  0   |          nan           |
|             tacotron2             |  0   |          nan           |
+-----------------------------------+------+------------------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+------------------------+
|                  name                   | bs | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------------+
|       MT5ForConditionalGeneration       | 2  |         1.7847         |
|      GPT2ForSequenceClassification      | 4  |         1.6428         |
|            XLNetLMHeadModel             | 4  |         1.4402         |
|               DistillGPT2               | 1  |         1.4396         |
|       T5ForConditionalGeneration        | 4  |         1.4351         |
|             OPTForCausalLM              | 4  |         1.3587         |
|     M2M100ForConditionalGeneration      | 2  |         1.3445         |
|       ElectraForQuestionAnswering       | 64 |         1.3409         |
|                CamemBert                | 1  |         1.3157         |
|         MegatronBertForCausalLM         | 2  |         1.2953         |
|       AlbertForQuestionAnswering        | 2  |         1.295          |
|            AlbertForMaskedLM            | 2  |         1.2919         |
|            YituTechConvBert             | 1  |         1.2885         |
|           ElectraForCausalLM            | 1  |         1.288          |
|           RobertaForCausalLM            | 4  |         1.2846         |
|     MobileBertForQuestionAnswering      | 32 |         1.2589         |
|          MobileBertForMaskedLM          | 16 |         1.2572         |
|     PLBartForConditionalGeneration      | 8  |         1.2468         |
|             XGLMForCausalLM             | 1  |         1.2433         |
|    LayoutLMForSequenceClassification    | 16 |         1.236          |
|     PegasusForConditionalGeneration     | 4  |         1.2286         |
|    MegatronBertForQuestionAnswering     | 8  |         1.2086         |
|      MBartForConditionalGeneration      | 8  |         1.1866         |
|           LayoutLMForMaskedLM           | 16 |         1.1736         |
|            TrOCRForCausalLM             | 8  |         1.1511         |
|         Speech2Text2ForCausalLM         | 64 |         1.1447         |
|           PegasusForCausalLM            | 8  |         1.1282         |
|      BartForConditionalGeneration       | 1  |         1.1087         |
|             BartForCausalLM             | 2  |         1.108          |
|     DistilBertForQuestionAnswering      | 32 |         1.1077         |
| BlenderbotSmallForConditionalGeneration | 32 |         1.0999         |
|          DistilBertForMaskedLM          | 16 |         1.0853         |
|            PLBartForCausalLM            | 16 |         1.0841         |
|       RobertaForQuestionAnswering       | 64 |         1.0745         |
|        BertForQuestionAnswering         | 64 |         1.0714         |
|           DebertaForMaskedLM            | 4  |         1.0529         |
|                 T5Small                 | 1  |         1.0504         |
|       DebertaForQuestionAnswering       | 4  |         1.0466         |
|             BertForMaskedLM             | 64 |         1.041          |
|       BlenderbotSmallForCausalLM        | 64 |         1.0391         |
|            MBartForCausalLM             | 16 |         1.0141         |
|                 BigBird                 | 1  |         0.9716         |
|               GoogleFnet                | 1  |         0.9328         |
|          AllenaiLongformerBase          | 0  |          0.0           |
+-----------------------------------------+----+------------------------+

Accuracy

+-----------------------------------------+----+------------------------+
|                  name                   | bs | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------------+
|            AlbertForMaskedLM            | 1  |          pass          |
|    LayoutLMForSequenceClassification    | 1  |          pass          |
|            MBartForCausalLM             | 1  |          pass          |
|         MegatronBertForCausalLM         | 1  |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |          pass          |
|          MobileBertForMaskedLM          | 1  |          pass          |
|     MobileBertForQuestionAnswering      | 1  |          pass          |
|             OPTForCausalLM              | 1  |          pass          |
|            PLBartForCausalLM            | 1  |          pass          |
|           PegasusForCausalLM            | 1  |          pass          |
|     PegasusForConditionalGeneration     | 1  |          pass          |
|           RobertaForCausalLM            | 1  |          pass          |
|       RobertaForQuestionAnswering       | 1  |          pass          |
|         Speech2Text2ForCausalLM         | 1  |          pass          |
|       T5ForConditionalGeneration        | 1  |          pass          |
|                 T5Small                 | 1  |          pass          |
|            TrOCRForCausalLM             | 1  |          pass          |
|             XGLMForCausalLM             | 1  |          pass          |
|            XLNetLMHeadModel             | 1  |          pass          |
|       AlbertForQuestionAnswering        | 1  |          pass          |
|     M2M100ForConditionalGeneration      | 1  |          pass          |
|           LayoutLMForMaskedLM           | 1  |          pass          |
|               GoogleFnet                | 1  |          pass          |
|             BartForCausalLM             | 1  |          pass          |
|      BartForConditionalGeneration       | 1  |          pass          |
|             BertForMaskedLM             | 1  |          pass          |
|        BertForQuestionAnswering         | 1  |          pass          |
|                 BigBird                 | 1  |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |          pass          |
|                CamemBert                | 1  |          pass          |
|           DebertaForMaskedLM            | 1  |          pass          |
|            YituTechConvBert             | 1  |          pass          |
|       DebertaForQuestionAnswering       | 1  |          pass          |
|          DistilBertForMaskedLM          | 1  |          pass          |
|     DistilBertForQuestionAnswering      | 1  |          pass          |
|               DistillGPT2               | 1  |          pass          |
|           ElectraForCausalLM            | 1  |          pass          |
|       ElectraForQuestionAnswering       | 1  |          pass          |
|      GPT2ForSequenceClassification      | 1  |          pass          |
|      MBartForConditionalGeneration      | 1  |      fail_to_run       |
|          AllenaiLongformerBase          | 1  |      fail_to_run       |
|     PLBartForConditionalGeneration      | 1  |      fail_to_run       |
|       MT5ForConditionalGeneration       | 1  |     fail_accuracy      |
+-----------------------------------------+----+------------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------------------+
|                  name                   | bs | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------------+
|            XLNetLMHeadModel             | 4  |        291.777         |
|            YituTechConvBert             | 1  |        117.2838        |
|       MT5ForConditionalGeneration       | 2  |        104.155         |
|          MobileBertForMaskedLM          | 16 |        92.6544         |
|     M2M100ForConditionalGeneration      | 2  |        78.3477         |
|     MobileBertForQuestionAnswering      | 32 |        78.1561         |
|     PegasusForConditionalGeneration     | 4  |         67.836         |
|     PLBartForConditionalGeneration      | 8  |        66.5782         |
|      MBartForConditionalGeneration      | 8  |        62.8985         |
|         MegatronBertForCausalLM         | 2  |        62.8671         |
|       T5ForConditionalGeneration        | 4  |        56.6793         |
|             XGLMForCausalLM             | 1  |        55.1411         |
|           DebertaForMaskedLM            | 4  |        54.5502         |
|    MegatronBertForQuestionAnswering     | 8  |        54.1915         |
|           RobertaForCausalLM            | 4  |        53.6744         |
|                 T5Small                 | 1  |        52.4961         |
| BlenderbotSmallForConditionalGeneration | 32 |        46.6959         |
|      BartForConditionalGeneration       | 1  |        46.0636         |
|    LayoutLMForSequenceClassification    | 16 |        45.1752         |
|           PegasusForCausalLM            | 8  |        37.3648         |
|            MBartForCausalLM             | 16 |        36.0993         |
|               DistillGPT2               | 1  |        32.4988         |
|             OPTForCausalLM              | 4  |        32.2822         |
|            TrOCRForCausalLM             | 8  |        30.9952         |
|             BertForMaskedLM             | 64 |        30.7106         |
|       ElectraForQuestionAnswering       | 64 |        30.3021         |
|           LayoutLMForMaskedLM           | 16 |        29.6613         |
|      GPT2ForSequenceClassification      | 4  |        29.4466         |
|     DistilBertForQuestionAnswering      | 32 |        28.5283         |
|       DebertaForQuestionAnswering       | 4  |        26.7377         |
|             BartForCausalLM             | 2  |        26.6824         |
|            AlbertForMaskedLM            | 2  |        25.7399         |
|                 BigBird                 | 1  |        23.9995         |
|            PLBartForCausalLM            | 16 |        22.3645         |
|         Speech2Text2ForCausalLM         | 64 |        21.9017         |
|       BlenderbotSmallForCausalLM        | 64 |        21.2607         |
|           ElectraForCausalLM            | 1  |         20.367         |
|                CamemBert                | 1  |        20.0012         |
|          DistilBertForMaskedLM          | 16 |        19.3193         |
|       RobertaForQuestionAnswering       | 64 |        16.5112         |
|        BertForQuestionAnswering         | 64 |        16.3605         |
|       AlbertForQuestionAnswering        | 2  |        15.2434         |
|               GoogleFnet                | 1  |        13.2948         |
|          AllenaiLongformerBase          | 0  |          nan           |
+-----------------------------------------+----+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+------------------------+
|                  name                   | bs | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------------+
|       DebertaForQuestionAnswering       | 4  |          1.13          |
|       T5ForConditionalGeneration        | 4  |         1.1049         |
|      GPT2ForSequenceClassification      | 4  |         1.0911         |
|                 T5Small                 | 1  |         1.0758         |
|           DebertaForMaskedLM            | 4  |         1.0346         |
|                 BigBird                 | 1  |         1.0115         |
|     M2M100ForConditionalGeneration      | 2  |         1.005          |
|        BertForQuestionAnswering         | 64 |         1.0032         |
|       RobertaForQuestionAnswering       | 64 |         1.0032         |
|       ElectraForQuestionAnswering       | 64 |         1.0025         |
|             XGLMForCausalLM             | 1  |         0.9999         |
|    LayoutLMForSequenceClassification    | 16 |         0.9827         |
|      BartForConditionalGeneration       | 1  |         0.9819         |
|     PegasusForConditionalGeneration     | 4  |         0.9769         |
|            XLNetLMHeadModel             | 4  |         0.9717         |
|       AlbertForQuestionAnswering        | 2  |         0.9674         |
|            TrOCRForCausalLM             | 8  |         0.9625         |
|           PegasusForCausalLM            | 8  |         0.9625         |
|            AlbertForMaskedLM            | 2  |         0.9567         |
|     DistilBertForQuestionAnswering      | 32 |         0.9481         |
|      MBartForConditionalGeneration      | 8  |         0.9416         |
|           LayoutLMForMaskedLM           | 16 |         0.9409         |
|               GoogleFnet                | 1  |         0.9366         |
|     PLBartForConditionalGeneration      | 8  |         0.9331         |
|             BartForCausalLM             | 2  |         0.9329         |
|               DistillGPT2               | 1  |          0.93          |
|    MegatronBertForQuestionAnswering     | 8  |         0.923          |
|             BertForMaskedLM             | 64 |         0.922          |
|            MBartForCausalLM             | 16 |         0.9194         |
|          DistilBertForMaskedLM          | 16 |         0.9137         |
| BlenderbotSmallForConditionalGeneration | 32 |         0.913          |
|            YituTechConvBert             | 1  |         0.9068         |
|            PLBartForCausalLM            | 16 |         0.903          |
|             OPTForCausalLM              | 4  |         0.898          |
|           RobertaForCausalLM            | 4  |         0.8927         |
|         Speech2Text2ForCausalLM         | 64 |         0.889          |
|                CamemBert                | 1  |         0.8656         |
|       BlenderbotSmallForCausalLM        | 64 |         0.8452         |
|          MobileBertForMaskedLM          | 16 |         0.8035         |
|         MegatronBertForCausalLM         | 2  |         0.7066         |
|           ElectraForCausalLM            | 1  |         0.7024         |
|     MobileBertForQuestionAnswering      | 32 |         0.6097         |
|       MT5ForConditionalGeneration       | 2  |         0.5416         |
|          AllenaiLongformerBase          | 0  |          nan           |
+-----------------------------------------+----+------------------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+------------------------+
|              name               | bs  | inductor_no_cudagraphs |
+---------------------------------+-----+------------------------+
|          ghostnet_100           | 128 |         1.7407         |
|            lcnet_050            | 128 |         1.6645         |
|         coat_lite_mini          | 128 |         1.6009         |
|        tnt_s_patch16_224        | 64  |         1.5205         |
|           dm_nfnet_f0           | 128 |         1.4216         |
|           volo_d1_224           | 64  |         1.3664         |
|         mobilenetv2_100         | 128 |         1.3539         |
|      mobilenetv3_large_100      | 128 |         1.3468         |
|      xcit_large_24_p8_224       |  5  |         1.3348         |
|             dla102              | 64  |         1.3235         |
|          gmixer_24_224          | 64  |         1.3207         |
|            nfnet_l0             | 64  |         1.3138         |
|        adv_inception_v3         | 128 |         1.3083         |
|       gluon_inception_v3        | 128 |         1.3077         |
|          inception_v3           | 128 |         1.3074         |
|          cspdarknet53           | 64  |         1.3038         |
|         crossvit_9_240          | 64  |          1.3           |
|            fbnetv3_b            | 128 |         1.2955         |
|           mnasnet_100           | 128 |         1.2817         |
|        sebotnet33ts_256         | 64  |         1.2795         |
|          botnet26t_256          | 128 |         1.2675         |
|       tf_efficientnet_b0        | 128 |         1.264          |
|           fbnetc_100            | 128 |         1.2639         |
|           resnest101e           | 32  |         1.2607         |
|          spnasnet_100           | 128 |         1.2556         |
|          jx_nest_base           | 32  |         1.2507         |
|           regnety_002           | 128 |         1.2382         |
|           convit_base           | 32  |         1.2372         |
|           selecsls42b           | 128 |         1.231          |
|        ese_vovnet19b_dw         | 128 |         1.2247         |
|       eca_botnext26ts_256       | 64  |         1.222          |
|           rexnet_100            | 128 |         1.2202         |
|        eca_halonext26ts         | 64  |         1.2158         |
|            pit_b_224            | 64  |         1.2154         |
|            tinynet_a            | 128 |         1.2002         |
|          convnext_base          | 32  |         1.1952         |
|          pnasnet5large          | 16  |         1.1934         |
|             dpn107              | 32  |         1.1914         |
|        res2net101_26w_4s        | 64  |         1.191          |
|        twins_pcpvt_base         | 32  |         1.1799         |
|            repvgg_a2            | 128 |         1.1677         |
|          cait_m36_384           |  2  |         1.1573         |
|           tf_mixnet_l           | 64  |         1.1473         |
|         poolformer_m36          | 64  |         1.1472         |
|            hrnet_w18            |  2  |         1.141          |
|  swin_base_patch4_window7_224   | 64  |         1.1349         |
|          gmlp_s16_224           | 64  |         1.1345         |
|           mobilevit_s           | 32  |         1.131          |
|            mixnet_l             | 64  |         1.1293         |
|        res2net50_14w_8s         |  2  |         1.1143         |
|      beit_base_patch16_224      | 64  |         1.1089         |
|           res2next50            |  2  |         1.1049         |
| deit_base_distilled_patch16_224 | 64  |         1.0897         |
|      vit_base_patch16_224       | 64  |         1.0779         |
|        gluon_xception65         | 32  |         1.0753         |
|        convmixer_768_32         | 32  |         1.0737         |
|     swsl_resnext101_32x16d      | 32  |         1.0725         |
|            gernet_l             | 128 |         1.072          |
|          mixer_b16_224          | 64  |         1.0324         |
|         visformer_small         | 128 |         1.0173         |
|          resmlp_12_224          | 128 |         0.9751         |
+---------------------------------+-----+------------------------+

Accuracy

+---------------------------------+----+------------------------+
|              name               | bs | inductor_no_cudagraphs |
+---------------------------------+----+------------------------+
|        adv_inception_v3         | 2  |          pass          |
|      beit_base_patch16_224      | 2  |          pass          |
|         mobilenetv2_100         | 2  |          pass          |
|      mobilenetv3_large_100      | 2  |          pass          |
|           mobilevit_s           | 2  |          pass          |
|            nfnet_l0             | 2  |          pass          |
|            pit_b_224            | 2  |          pass          |
|          pnasnet5large          | 2  |          pass          |
|         poolformer_m36          | 2  |          pass          |
|           regnety_002           | 2  |          pass          |
|            repvgg_a2            | 2  |          pass          |
|        res2net101_26w_4s        | 2  |          pass          |
|        res2net50_14w_8s         | 2  |          pass          |
|           res2next50            | 2  |          pass          |
|          resmlp_12_224          | 2  |          pass          |
|           rexnet_100            | 2  |          pass          |
|        sebotnet33ts_256         | 2  |          pass          |
|           selecsls42b           | 2  |          pass          |
|          spnasnet_100           | 2  |          pass          |
|  swin_base_patch4_window7_224   | 2  |          pass          |
|     swsl_resnext101_32x16d      | 2  |          pass          |
|       tf_efficientnet_b0        | 2  |          pass          |
|           tf_mixnet_l           | 2  |          pass          |
|            tinynet_a            | 2  |          pass          |
|        tnt_s_patch16_224        | 2  |          pass          |
|        twins_pcpvt_base         | 2  |          pass          |
|         visformer_small         | 2  |          pass          |
|      vit_base_patch16_224       | 2  |          pass          |
|           volo_d1_224           | 2  |          pass          |
|           mnasnet_100           | 2  |          pass          |
|            mixnet_l             | 2  |          pass          |
|          mixer_b16_224          | 2  |          pass          |
|            lcnet_050            | 2  |          pass          |
|          botnet26t_256          | 2  |          pass          |
|          cait_m36_384           | 2  |          pass          |
|         coat_lite_mini          | 2  |          pass          |
|           convit_base           | 2  |          pass          |
|        convmixer_768_32         | 2  |          pass          |
|          convnext_base          | 2  |          pass          |
|         crossvit_9_240          | 2  |          pass          |
|          cspdarknet53           | 2  |          pass          |
|             dla102              | 2  |          pass          |
|           dm_nfnet_f0           | 2  |          pass          |
|             dpn107              | 2  |          pass          |
|       eca_botnext26ts_256       | 2  |          pass          |
|        eca_halonext26ts         | 2  |          pass          |
|        ese_vovnet19b_dw         | 2  |          pass          |
|           fbnetc_100            | 2  |          pass          |
|            gernet_l             | 2  |          pass          |
|          ghostnet_100           | 2  |          pass          |
|       gluon_inception_v3        | 2  |          pass          |
|        gluon_xception65         | 2  |          pass          |
|          gmixer_24_224          | 2  |          pass          |
|          gmlp_s16_224           | 2  |          pass          |
|            hrnet_w18            | 2  |          pass          |
|          inception_v3           | 2  |          pass          |
|          jx_nest_base           | 2  |          pass          |
|      xcit_large_24_p8_224       | 2  |          pass          |
|           resnest101e           | 2  |     fail_accuracy      |
|            fbnetv3_b            | 2  |     fail_accuracy      |
| deit_base_distilled_patch16_224 | 2  |     fail_accuracy      |
+---------------------------------+----+------------------------+

Compilation latency (sec)

+---------------------------------+-----+------------------------+
|              name               | bs  | inductor_no_cudagraphs |
+---------------------------------+-----+------------------------+
|        twins_pcpvt_base         | 32  |        555.6571        |
|         coat_lite_mini          | 128 |        359.4672        |
|           mobilevit_s           | 32  |        340.027         |
|       eca_botnext26ts_256       | 64  |        282.2192        |
|        eca_halonext26ts         | 64  |        259.4407        |
|          convnext_base          | 32  |        228.0653        |
|  swin_base_patch4_window7_224   | 64  |        172.1811        |
|          cait_m36_384           |  2  |        164.114         |
|      xcit_large_24_p8_224       |  5  |        163.5473        |
|         crossvit_9_240          | 64  |        154.7684        |
|          jx_nest_base           | 32  |        134.0668        |
|        sebotnet33ts_256         | 64  |        129.4275        |
|           resnest101e           | 32  |        104.9281        |
|          botnet26t_256          | 128 |        102.8117        |
|          gmlp_s16_224           | 64  |        92.6649         |
|            hrnet_w18            |  2  |        84.8777         |
|           convit_base           | 32  |        81.1096         |
|           volo_d1_224           | 64  |        72.6098         |
|          gmixer_24_224          | 64  |        69.2979         |
|         visformer_small         | 128 |        68.4914         |
|          pnasnet5large          | 16  |        68.4373         |
|            pit_b_224            | 64  |        63.5178         |
|        tnt_s_patch16_224        | 64  |        61.7856         |
|        res2net101_26w_4s        | 64  |         51.385         |
|        res2net50_14w_8s         |  2  |        43.1844         |
|         poolformer_m36          | 64  |        43.1124         |
|          mixer_b16_224          | 64  |        38.2836         |
|             dpn107              | 32  |         36.985         |
|          resmlp_12_224          | 128 |        36.3702         |
| deit_base_distilled_patch16_224 | 64  |        32.9804         |
|            fbnetv3_b            | 128 |        32.3981         |
|        adv_inception_v3         | 128 |        31.3202         |
|       gluon_inception_v3        | 128 |        30.7326         |
|          inception_v3           | 128 |        30.5649         |
|           tf_mixnet_l           | 64  |        30.4171         |
|        gluon_xception65         | 32  |        30.2167         |
|          ghostnet_100           | 128 |        29.5224         |
|             dla102              | 64  |        29.5182         |
|            mixnet_l             | 64  |        29.4533         |
|      beit_base_patch16_224      | 64  |        29.1199         |
|           dm_nfnet_f0           | 128 |        25.1402         |
|     swsl_resnext101_32x16d      | 32  |        24.8712         |
|           rexnet_100            | 128 |         24.167         |
|           res2next50            |  2  |        24.1215         |
|            tinynet_a            | 128 |        23.0467         |
|      vit_base_patch16_224       | 64  |        22.4942         |
|       tf_efficientnet_b0        | 128 |        21.1541         |
|          cspdarknet53           | 64  |        21.0723         |
|            nfnet_l0             | 64  |         21.023         |
|           fbnetc_100            | 128 |        19.5932         |
|          spnasnet_100           | 128 |        19.2389         |
|        convmixer_768_32         | 32  |        18.1042         |
|      mobilenetv3_large_100      | 128 |        17.3766         |
|           regnety_002           | 128 |         16.596         |
|         mobilenetv2_100         | 128 |        16.3604         |
|            repvgg_a2            | 128 |         16.16          |
|            gernet_l             | 128 |        16.0605         |
|           mnasnet_100           | 128 |        15.9917         |
|           selecsls42b           | 128 |        14.7163         |
|        ese_vovnet19b_dw         | 128 |        11.7094         |
|            lcnet_050            | 128 |        11.0843         |
+---------------------------------+-----+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+------------------------+
|              name               | bs  | inductor_no_cudagraphs |
+---------------------------------+-----+------------------------+
|          gmixer_24_224          | 64  |         1.4405         |
|            tinynet_a            | 128 |         1.3692         |
|          pnasnet5large          | 16  |         1.3282         |
|            nfnet_l0             | 64  |         1.3209         |
|           rexnet_100            | 128 |         1.2765         |
|           convit_base           | 32  |         1.2244         |
|       eca_botnext26ts_256       | 64  |         1.2041         |
|        eca_halonext26ts         | 64  |         1.2034         |
|       tf_efficientnet_b0        | 128 |         1.199          |
|           mobilevit_s           | 32  |         1.1987         |
|         mobilenetv2_100         | 128 |         1.1104         |
|          cait_m36_384           |  2  |         1.0986         |
|          ghostnet_100           | 128 |         1.0963         |
|           tf_mixnet_l           | 64  |         1.0815         |
|         poolformer_m36          | 64  |         1.069          |
|             dla102              | 64  |         1.0544         |
|           dm_nfnet_f0           | 128 |         1.0495         |
|           selecsls42b           | 128 |         1.0324         |
|            mixnet_l             | 64  |         1.0059         |
|      xcit_large_24_p8_224       |  5  |         1.0039         |
|           resnest101e           | 32  |         1.002          |
|        ese_vovnet19b_dw         | 128 |         0.9967         |
|      vit_base_patch16_224       | 64  |         0.9873         |
|  swin_base_patch4_window7_224   | 64  |         0.9871         |
|            pit_b_224            | 64  |         0.9866         |
|        tnt_s_patch16_224        | 64  |         0.986          |
|        convmixer_768_32         | 32  |         0.9853         |
|          mixer_b16_224          | 64  |         0.9851         |
|         coat_lite_mini          | 128 |         0.9838         |
| deit_base_distilled_patch16_224 | 64  |         0.9831         |
|      beit_base_patch16_224      | 64  |         0.982          |
|            fbnetv3_b            | 128 |         0.977          |
|          jx_nest_base           | 32  |         0.9714         |
|        sebotnet33ts_256         | 64  |         0.9712         |
|            hrnet_w18            |  2  |         0.9689         |
|        twins_pcpvt_base         | 32  |         0.9634         |
|             dpn107              | 32  |         0.9562         |
|        res2net101_26w_4s        | 64  |         0.9547         |
|         visformer_small         | 128 |         0.951          |
|         crossvit_9_240          | 64  |         0.944          |
|        gluon_xception65         | 32  |         0.9376         |
|          gmlp_s16_224           | 64  |         0.9324         |
|        res2net50_14w_8s         |  2  |         0.9317         |
|           res2next50            |  2  |         0.9281         |
|     swsl_resnext101_32x16d      | 32  |         0.9249         |
|          convnext_base          | 32  |         0.9239         |
|            lcnet_050            | 128 |         0.923          |
|           volo_d1_224           | 64  |         0.9172         |
|          spnasnet_100           | 128 |         0.9157         |
|      mobilenetv3_large_100      | 128 |         0.9126         |
|           mnasnet_100           | 128 |         0.9077         |
|        adv_inception_v3         | 128 |         0.9073         |
|       gluon_inception_v3        | 128 |         0.9073         |
|          inception_v3           | 128 |         0.9073         |
|           regnety_002           | 128 |         0.8993         |
|          cspdarknet53           | 64  |         0.8875         |
|          botnet26t_256          | 128 |         0.8702         |
|           fbnetc_100            | 128 |         0.8498         |
|          resmlp_12_224          | 128 |         0.8253         |
|            gernet_l             | 128 |         0.8234         |
|            repvgg_a2            | 128 |         0.8011         |
+---------------------------------+-----+------------------------+

Performance graphs

see more

bench_logs/huggingface_float32.png :

bench_logs/timm_models_float32.png :

bench_logs/torchbench_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 89%, 49/55 | 98%, 43/44  | 100%, 61/61 |
|       aot_eager        | 87%, 48/55 | 98%, 43/44  | 90%, 55/61  |
|     aot_cudagraphs     | 73%, 40/55 | 57%, 25/44  | 56%, 34/61  |
|      aot_nvfuser       | 58%, 32/55 |  2%, 1/44   | 82%, 50/61  |
|        inductor        | 87%, 48/55 | 93%, 41/44  | 97%, 59/61  |
| inductor_no_cudagraphs | 89%, 49/55 | 93%, 41/44  | 95%, 58/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.02x    |    1.00x    |
|       aot_eager        |   1.01x    |    1.00x    |    1.00x    |
|     aot_cudagraphs     |   1.09x    |    1.14x    |    1.07x    |
|      aot_nvfuser       |   1.13x    |    1.12x    |    1.12x    |
|        inductor        |   1.49x    |    1.64x    |    1.34x    |
| inductor_no_cudagraphs |   1.23x    |    1.32x    |    1.24x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    1.73    |    2.18     |    1.87     |
|       aot_eager        |    6.15    |    9.08     |    8.21     |
|     aot_cudagraphs     |    6.35    |    11.31    |    16.66    |
|      aot_nvfuser       |   20.10    |    9.46     |    48.56    |
|        inductor        |   58.49    |    50.41    |    80.71    |
| inductor_no_cudagraphs |   25.61    |    23.48    |    27.66    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.96x    |    0.98x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.87x    |
|     aot_cudagraphs     |   0.39x    |    0.36x    |    0.32x    |
|      aot_nvfuser       |   0.83x    |    1.08x    |    0.84x    |
|        inductor        |   0.84x    |    0.77x    |    0.95x    |
| inductor_no_cudagraphs |   0.98x    |    0.95x    |    1.03x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|           BERT_pytorch            |  16  | 1.0194 |   0.879   |      0.0       |     0.0     |  1.9623  |         1.9497         |
|             hf_Albert             |  8   | 1.0014 |  0.9975   |     0.7449     |     0.0     |  1.6638  |         1.6536         |
|            hf_T5_large            |  2   | 1.0247 |   0.898   |      0.0       |     0.0     |  1.622   |         1.5824         |
|               hf_T5               |  8   | 1.0013 |  0.9935   |      0.0       |     0.0     |   0.0    |         1.5529         |
|         timm_efficientdet         |  1   | 0.9857 |  0.8892   |      0.0       |     0.0     |  4.3437  |         1.5417         |
|        speech_transformer         |  32  | 1.0172 |  0.8978   |      0.0       |     0.0     |  1.5546  |         1.5389         |
|           timm_resnest            |  32  | 0.9993 |  1.0018   |     0.8046     |   1.1825    |  1.522   |         1.4524         |
|              hf_GPT2              |  4   | 1.0053 |  0.9782   |     0.7229     |     0.0     |  1.4991  |         1.4387         |
|            timm_nfnet             | 128  | 0.9996 |  0.9997   |      0.0       |   1.2121    |  1.4686  |         1.422          |
|      timm_vision_transformer      |  8   | 1.0039 |  0.9315   |     1.5194     |    1.347    |  2.6183  |         1.4119         |
|           mobilenet_v2            |  96  |  1.0   |  1.0001   |     0.7309     |   1.0396    |  1.4291  |         1.4057         |
|           hf_GPT2_large           |  4   | 1.0003 |  0.9804   |      0.0       |     0.0     |   0.0    |         1.3823         |
|        mobilenet_v3_large         |  32  | 1.0038 |  1.1065   |     1.0272     |   1.3768    |  1.9995  |         1.3618         |
|        shufflenet_v2_x1_0         | 128  | 1.0008 |  1.0621   |     0.807      |   1.1947    |  1.5531  |         1.3466         |
|           fastNLP_Bert            |  6   | 0.9989 |  0.9768   |     0.7537     |     0.0     |  1.3708  |         1.3464         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9976 |  0.9401   |     1.2727     |   1.1856    |  1.8012  |         1.3174         |
|            densenet121            |  4   | 1.0008 |  1.0079   |     2.3471     |   1.4375    |  5.2124  |         1.3048         |
|            mnasnet1_0             |  32  | 1.002  |  1.0959   |     0.8595     |   1.2986    |  1.4694  |         1.2804         |
|           squeezenet1_1           |  32  | 1.001  |  0.9982   |     1.0557     |   1.1602    |  1.8634  |         1.2722         |
|       functorch_dp_cifar10        |  64  | 1.0016 |  0.9726   |     2.1894     |   1.1956    |  3.7277  |         1.2509         |
|             resnet18              |  16  | 1.0067 |  1.0992   |     1.1606     |   1.3949    |  1.9712  |         1.2497         |
|          LearningToPaint          |  96  | 1.0031 |  1.0649   |     0.8611     |   1.2372    |  1.2607  |         1.2119         |
|         timm_efficientnet         |  32  | 0.9576 |   0.813   |     0.6983     |   1.0842    |  1.3303  |         1.2046         |
|          resnext50_32x4d          |  8   | 1.0011 |  1.0877   |     1.2336     |   1.3743    |  2.1691  |         1.1915         |
|           pytorch_unet            |  1   | 0.9999 |  0.9981   |     0.8445     |   1.0754    |  1.2019  |         1.1862         |
|              hf_Bart              |  4   | 1.0134 |  0.9734   |     0.7282     |     0.0     |  1.1777  |         1.1847         |
|              hf_Bert              |  4   | 1.0249 |  0.9955   |     0.7295     |     0.0     |  1.2511  |         1.1761         |
|             resnet50              |  32  | 0.9993 |  0.9926   |     0.758      |   1.1617    |  1.2047  |         1.1684         |
|               vgg16               |  64  | 0.9999 |  0.9991   |     0.8593     |    0.997    |  1.1746  |         1.168          |
|            Super_SloMo            |  6   | 1.0003 |  0.9977   |     0.8666     |     0.0     |  1.1792  |         1.1641         |
|              alexnet              | 128  | 0.9993 |  0.9984   |     0.8022     |   1.0004    |  1.1621  |         1.1636         |
|           hf_DistilBert           |  8   | 1.0007 |  0.9545   |     0.6706     |     0.0     |  1.1563  |         1.1574         |
|          pytorch_struct           | 200  | 0.9892 |  0.7368   |     0.8781     |   0.8824    |  1.8205  |         1.1475         |
|            hf_Reformer            |  4   | 0.9968 |    0.0    |     0.9269     |     0.0     |   1.11   |         1.1335         |
|        Background_Matting         |  4   | 1.0003 |  1.0217   |     0.8652     |   1.0811    |  1.1144  |         1.1065         |
|            timm_regnet            |  32  | 0.9652 |  0.9632   |     0.7803     |   1.0936    |  1.1265  |         1.093          |
|          pytorch_stargan          |  16  | 0.9989 |  0.9838   |     0.866      |   0.9884    |  1.1214  |         1.091          |
|               dcgan               |  32  | 0.9877 |  1.0056   |     1.279      |    1.155    |  1.6411  |         1.0791         |
|                drq                |  1   | 1.0139 |  0.8405   |     1.719      |   1.0412    |  2.4854  |         1.0706         |
|              yolov3               |  16  | 1.0001 |  0.9952   |     0.7899     |   1.1839    |  1.079   |         1.0643         |
|   timm_vision_transformer_large   |  8   | 1.0001 |   0.99    |      0.0       |   0.9783    |  1.0461  |         1.0377         |
| attention_is_all_you_need_pytorch | 256  | 1.0001 |  0.9731   |      0.0       |     0.0     |  1.0437  |         1.032          |
|            tts_angular            |  64  | 0.9883 |  0.9665   |     0.9902     |   0.9945    |  1.0042  |         1.0127         |
|            hf_BigBird             |  2   | 0.9921 |   0.947   |     0.957      |     0.0     |  1.0979  |         1.0012         |
|              demucs               |  4   | 1.0001 |  1.0001   |     0.9991     |   0.9999    |  0.9995  |         1.0003         |
|            timm_vovnet            |  32  | 0.9078 |  0.9041   |     0.7121     |   0.9791    |  0.9903  |         0.9985         |
|         soft_actor_critic         | 256  | 0.9977 |   0.754   |     1.0691     |   0.9931    |  1.452   |         0.974          |
|      nvidia_deeprecommender       | 256  | 0.9989 |  0.9628   |     0.5844     |   0.9429    |  0.9043  |         0.9643         |
|           lennard_jones           | 1000 | 0.9648 |  0.8214   |     1.038      |   1.0215    |  1.8383  |         0.9574         |
|               dlrm                | 2048 |  0.0   |    0.0    |      0.0       |     0.0     |  0.9587  |          0.0           |
|           hf_Longformer           |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|    mobilenet_v2_quantized_qat     |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|      resnet50_quantized_qat       |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        Background_Matting         |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |          pass          |
|            Super_SloMo            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           fastNLP_Bert            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|             hf_Albert             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bart              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bert              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_BigBird             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           hf_DistilBert           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              yolov3               |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|           hf_Longformer           |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|    mobilenet_v2_quantized_qat     |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|      resnet50_quantized_qat       |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|             tacotron2             |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|              yolov3               |  16  | 2.6946  |  8.0738   |    11.2944     |   43.3593   | 421.2781 |        390.1045        |
|            hf_T5_large            |  2   | 12.8459 |  41.0039  |      nan       |     nan     | 208.1634 |        98.7696         |
|         timm_efficientdet         |  1   | 19.5214 |  36.4621  |      nan       |     nan     | 472.193  |        78.5717         |
|           hf_GPT2_large           |  4   | 4.7965  |  18.7494  |      nan       |     nan     |   nan    |        60.3479         |
|            densenet121            |  4   |  1.863  |  12.3631  |    18.7436     |   88.0004   | 43.8402  |        42.6252         |
|   timm_vision_transformer_large   |  8   | 2.1831  |  13.7783  |      nan       |   24.2048   | 109.1333 |        37.7403         |
|            timm_nfnet             | 128  | 1.7778  |   7.224   |      nan       |   29.3511   | 29.2828  |        25.6315         |
|           BERT_pytorch            |  16  |  1.408  |  7.3531   |      nan       |     nan     | 93.2451  |        25.0133         |
|        speech_transformer         |  32  | 1.5601  |  8.0467   |      nan       |     nan     | 155.8826 |        24.9263         |
|            hf_BigBird             |  2   | 7.1877  |  13.3445  |    28.8858     |     nan     | 40.5724  |        24.4433         |
|              hf_Bart              |  4   | 1.3634  |  7.7226   |    11.7663     |     nan     | 49.8071  |        23.1735         |
|               hf_T5               |  8   | 1.9849  |   8.891   |      nan       |     nan     |   nan    |         22.038         |
|          pytorch_struct           | 200  | 0.2299  |   0.786   |     1.3498     |   4.0654    | 77.2184  |        21.6208         |
|           fastNLP_Bert            |  6   | 1.4528  |  6.6596   |    10.1622     |     nan     | 65.9835  |        21.0276         |
| attention_is_all_you_need_pytorch | 256  | 1.0727  |  7.0829   |      nan       |     nan     | 138.0688 |        20.7995         |
|            timm_regnet            |  32  | 2.1438  |   7.87    |    20.7961     |   46.7355   | 20.7534  |        19.5922         |
|              hf_Bert              |  4   | 1.3338  |   6.111   |     8.8726     |     nan     | 30.6003  |        18.6261         |
|         timm_efficientnet         |  32  |  1.597  |  6.3786   |    15.6253     |   52.3706   | 19.6752  |        18.5284         |
|        shufflenet_v2_x1_0         | 128  | 0.8135  |   4.952   |     7.0916     |   26.2354   | 17.7916  |        17.8256         |
|              hf_GPT2              |  4   | 1.2638  |  5.9182   |     9.0291     |     nan     | 63.8222  |        17.4366         |
|        mobilenet_v3_large         |  32  | 0.7653  |  4.4271   |     6.2051     |   52.4655   |  29.959  |        16.0336         |
|            Super_SloMo            |  6   | 0.9414  |  4.7808   |     6.4957     |     nan     | 17.1196  |        16.0166         |
|        Background_Matting         |  4   | 0.6134  |  4.1258   |     6.477      |   28.9642   | 16.4429  |        15.4289         |
|           mobilenet_v2            |  96  | 0.7034  |  4.1268   |     6.2732     |   36.5168   |  16.279  |        15.3145         |
|            mnasnet1_0             |  32  | 0.7014  |  4.1117   |     5.8817     |   30.2405   | 29.9614  |        14.7027         |
|            timm_vovnet            |  32  | 1.4177  |  4.3209   |    10.1196     |   23.2123   | 15.4898  |        14.3583         |
|          resnext50_32x4d          |  8   | 0.7751  |  4.5259   |     6.3593     |   28.1888   | 26.9254  |        14.1106         |
|             hf_Albert             |  8   | 0.9434  |  5.5883   |     8.2863     |     nan     |  41.154  |        13.9319         |
|             resnet50              |  32  | 0.7658  |  4.5668   |     6.5562     |   31.6857   | 15.0163  |        13.8694         |
|            hf_Reformer            |  4   | 2.3633  |    nan    |     9.2796     |     nan     |  35.777  |        13.2993         |
|      timm_vision_transformer      |  8   | 0.7335  |  4.2715   |     5.6705     |   9.1399    | 143.3979 |        11.9762         |
|           timm_resnest            |  32  | 0.4944  |  2.5086   |     3.5361     |   35.1848   | 134.3393 |         10.357         |
|           hf_DistilBert           |  8   |  0.451  |  2.8942   |     5.7847     |     nan     |  18.879  |         9.3411         |
|       functorch_dp_cifar10        |  64  | 0.3339  |  1.8771   |     2.7056     |   5.4024    | 25.1969  |         8.6672         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.3516  |  2.1508   |     2.893      |   3.7342    |  8.3594  |         7.7951         |
|           pytorch_unet            |  1   | 0.3872  |  1.8887   |     2.8571     |   19.446    |  8.3426  |         7.6875         |
|             resnet18              |  16  |  0.367  |  1.7235   |     2.4475     |   17.4241   | 22.5813  |         7.0437         |
|          LearningToPaint          |  96  | 0.3862  |  1.7658   |     2.5783     |   24.1451   |  6.9601  |         6.8803         |
|          pytorch_stargan          |  16  | 0.3661  |  2.2699   |     3.0764     |   3.7532    | 105.5884 |         6.3037         |
|           squeezenet1_1           |  32  | 0.2253  |  0.9562   |     1.4211     |    4.543    |  4.0946  |         3.705          |
|                drq                |  1   | 0.1383  |   0.447   |     0.7869     |   3.4175    |  3.7932  |         3.2698         |
|               vgg16               |  64  | 0.1761  |  0.6425   |     1.0353     |   2.4605    |  3.6554  |         3.0491         |
|         soft_actor_critic         | 256  | 0.1967  |  0.3309   |     0.5678     |   1.5231    |  3.3987  |         2.6718         |
|              alexnet              | 128  | 0.1492  |  0.4065   |     0.6674     |   2.3697    |  3.0556  |         2.5189         |
|               dcgan               |  32  | 0.1616  |  0.4168   |      0.62      |   3.7183    |  2.6623  |         2.376          |
|      nvidia_deeprecommender       | 256  | 0.1959  |  0.4069   |     0.6608     |   2.3903    |  4.1141  |         2.0136         |
|           lennard_jones           | 1000 | 0.1369  |  0.2851   |     0.4879     |    1.055    |  2.0382  |         1.6666         |
|            tts_angular            |  64  | 0.2064  |  0.2619   |     0.3892     |   0.9866    |  2.0799  |         1.5386         |
|              demucs               |  4   | 0.2932  |  0.2986   |     0.3119     |   0.2959    |  0.2013  |         0.2004         |
|               dlrm                | 2048 |   nan   |    nan    |      nan       |     nan     |  3.5672  |          nan           |
|           hf_Longformer           |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|    mobilenet_v2_quantized_qat     |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|      resnet50_quantized_qat       |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|         timm_efficientdet         |  1   | 1.0111 |   0.823   |      nan       |     nan     |  1.1165  |         1.4096         |
|         timm_efficientnet         |  32  | 0.9937 |  0.7666   |     0.2635     |   0.7837    |  1.3106  |         1.3377         |
|             hf_Albert             |  8   | 0.9333 |  0.9333   |     0.2822     |     nan     |  0.8804  |         1.2559         |
|            Super_SloMo            |  6   | 1.0024 |  0.9527   |     0.363      |     nan     |  1.1857  |         1.1914         |
|           fastNLP_Bert            |  6   | 1.0011 |  0.9152   |     0.3384     |     nan     |  0.8343  |         1.1671         |
|               hf_T5               |  8   | 0.9527 |  0.9445   |      nan       |     nan     |   nan    |         1.1507         |
|              hf_Bart              |  4   | 0.9617 |   0.879   |     0.3244     |     nan     |  0.853   |         1.1395         |
|           squeezenet1_1           |  32  | 0.9749 |  0.8159   |     0.3372     |   0.9742    |  1.0823  |         1.1267         |
|            hf_BigBird             |  2   | 0.9604 |  0.9604   |     0.4302     |     nan     |  0.8205  |         1.1123         |
|           mobilenet_v2            |  96  | 0.9928 |  0.7624   |     0.3062     |   0.7638    |  1.1005  |         1.1105         |
|              hf_GPT2              |  4   | 0.9548 |   0.887   |     0.353      |     nan     |  0.9505  |         1.1071         |
|           BERT_pytorch            |  16  |  1.0   |  0.8995   |      nan       |     nan     |  0.825   |         1.1056         |
|           hf_GPT2_large           |  4   | 0.936  |  0.8768   |      nan       |     nan     |   nan    |         1.0941         |
|            timm_nfnet             | 128  | 0.9358 |  0.8936   |      nan       |   0.9478    |  1.0219  |         1.0495         |
|        speech_transformer         |  32  | 0.9982 |  0.9159   |      nan       |     nan     |  0.8959  |         1.0442         |
|        shufflenet_v2_x1_0         | 128  | 0.9739 |  0.8944   |      0.35      |   0.8662    |  0.9791  |         1.0072         |
|         soft_actor_critic         | 256  | 0.9997 |  0.9637   |     0.4355     |   0.9555    |   0.75   |         0.9991         |
|           lennard_jones           | 1000 | 0.9995 |  0.9995   |     0.3711     |   1.0947    |  0.5646  |         0.9989         |
|      timm_vision_transformer      |  8   | 0.9943 |  0.8835   |     0.3305     |   0.8104    |  0.712   |         0.9952         |
|          pytorch_stargan          |  16  | 0.9975 |  1.0179   |     0.4129     |   1.0085    |  0.9023  |         0.9928         |
|   timm_vision_transformer_large   |  8   | 0.9997 |  0.8415   |      nan       |    0.801    |  0.8284  |         0.9907         |
|              demucs               |  4   | 0.9886 |  0.9886   |     0.9886     |   0.9886    |  0.9886  |         0.9886         |
|            tts_angular            |  64  | 0.9884 |  0.9884   |     0.9829     |   0.9884    |  0.983   |         0.9884         |
|            hf_Reformer            |  4   | 0.3011 |    nan    |     0.2397     |     nan     |  0.299   |         0.9878         |
|          pytorch_struct           | 200  |  1.0   |  0.5079   |     0.4824     |   0.5079    |  0.4222  |         0.9692         |
|              hf_Bert              |  4   | 0.9683 |  0.8952   |     0.3395     |     nan     |  0.8564  |         0.9684         |
|           timm_resnest            |  32  | 0.9935 |   0.88    |     0.3236     |   0.8024    |  0.8974  |         0.9679         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9986 |  0.9149   |     0.3919     |   0.9141    |  0.8848  |         0.9646         |
| attention_is_all_you_need_pytorch | 256  | 0.9476 |  0.9243   |      nan       |     nan     |  0.816   |         0.9429         |
|            timm_regnet            |  32  | 0.9985 |  0.8614   |     0.3327     |   0.8784    |  0.9284  |         0.9323         |
|            densenet121            |  4   | 0.9904 |  0.8812   |     0.3439     |   0.8551    |  0.857   |         0.9307         |
|          resnext50_32x4d          |  8   | 0.9954 |  0.8671   |     0.3595     |   0.8203    |  0.8303  |         0.9303         |
|              yolov3               |  16  | 0.9957 |   0.844   |     0.334      |   0.8814    |  0.9231  |         0.9271         |
|            hf_T5_large            |  2   | 0.922  |  0.8722   |      nan       |     nan     |  0.8737  |         0.922          |
|           hf_DistilBert           |  8   | 0.9211 |  0.9047   |     0.2989     |     nan     |  0.7841  |         0.9208         |
|        Background_Matting         |  4   | 0.9998 |  0.9492   |     0.3595     |   0.9749    |  0.9139  |         0.9164         |
|        mobilenet_v3_large         |  32  | 0.9878 |  0.8563   |     0.3277     |   0.8681    |  0.8829  |         0.9148         |
|            mnasnet1_0             |  32  | 0.9869 |  0.8985   |     0.333      |   0.8263    |  0.8531  |         0.9097         |
|             resnet50              |  32  | 0.9942 |  0.8719   |     0.3367     |    0.797    |  0.8565  |         0.8913         |
|                drq                |  1   | 0.987  |  0.8777   |     0.4252     |   0.8772    |  0.7632  |         0.8778         |
|       functorch_dp_cifar10        |  64  | 0.9961 |  0.8224   |     0.4456     |   0.8227    |  0.4056  |         0.871          |
|           pytorch_unet            |  1   | 0.9985 |  0.8521   |     0.3441     |   0.8496    |  0.859   |         0.8608         |
|             resnet18              |  16  | 0.9831 |  0.7792   |     0.3591     |   0.6971    |  0.6902  |         0.8401         |
|              alexnet              | 128  | 0.9542 |   0.745   |     0.4163     |   0.7455    |  0.743   |         0.8332         |
|            timm_vovnet            |  32  | 0.9933 |  0.7603   |     0.3202     |   0.7741    |  0.8251  |         0.8316         |
|               dcgan               |  32  | 0.9754 |  0.7634   |     0.4581     |   0.7634    |  0.767   |         0.7903         |
|          LearningToPaint          |  96  | 0.9442 |  0.6896   |     0.3385     |   0.6515    |  0.6882  |         0.7462         |
|               vgg16               |  64  | 0.9944 |  0.6638   |     0.3214     |   0.6639    |  0.6471  |         0.6497         |
|      nvidia_deeprecommender       | 256  | 0.5598 |  0.5598   |     0.4624     |   0.5598    |  0.5598  |         0.5598         |
|               dlrm                | 2048 |  nan   |    nan    |      nan       |     nan     |  0.7035  |          nan           |
|           hf_Longformer           |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|    mobilenet_v2_quantized_qat     |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|      resnet50_quantized_qat       |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|       MT5ForConditionalGeneration       | 2  | 1.0262 |  0.9254   |      0.0       |     0.0     |  4.6504  |         2.2192         |
|               DistillGPT2               | 1  | 1.0344 |   0.944   |     1.1266     |     0.0     |  2.0795  |         1.8463         |
|             OPTForCausalLM              | 4  | 1.0129 |  0.8998   |     1.6026     |     0.0     |  2.7097  |         1.6882         |
|      GPT2ForSequenceClassification      | 4  | 1.0001 |  0.9784   |      0.0       |     0.0     |  1.6741  |         1.663          |
|           RobertaForCausalLM            | 4  | 1.0464 |  0.9449   |     1.4924     |     0.0     |  2.7268  |         1.6084         |
|    MegatronBertForQuestionAnswering     | 8  | 1.0444 |  0.9429   |     1.0235     |     0.0     |  1.6361  |         1.5772         |
|         MegatronBertForCausalLM         | 2  | 1.0421 |  0.9443   |     1.5697     |     0.0     |  3.0768  |         1.5678         |
|     MobileBertForQuestionAnswering      | 32 | 1.0187 |  0.9249   |      0.0       |     0.0     |  2.9281  |         1.5577         |
|     PLBartForConditionalGeneration      | 8  | 1.0116 |  0.9033   |     1.0868     |     0.0     |  1.794   |         1.5538         |
|           ElectraForCausalLM            | 1  | 1.0438 |  0.9492   |     2.0902     |     0.0     |  4.815   |         1.5503         |
|             XGLMForCausalLM             | 1  | 1.012  |  0.8772   |      0.0       |     0.0     |  2.5999  |         1.5381         |
|          MobileBertForMaskedLM          | 16 | 1.0256 |  0.9199   |      0.0       |     0.0     |  2.7678  |         1.5312         |
|                CamemBert                | 1  | 1.0543 |   0.956   |     1.5293     |     0.0     |  2.4364  |         1.507          |
|      MBartForConditionalGeneration      | 8  | 1.0158 |   0.922   |     0.9469     |     0.0     |  1.5609  |         1.4932         |
|     M2M100ForConditionalGeneration      | 2  | 1.0698 |   0.878   |     1.3876     |     0.0     |  2.6513  |         1.4921         |
|     PegasusForConditionalGeneration     | 4  | 1.0145 |  0.9036   |     1.2401     |     0.0     |  2.298   |         1.4806         |
|            YituTechConvBert             | 1  | 1.0247 |  0.9368   |      0.0       |     0.0     |  3.7787  |         1.4538         |
|       T5ForConditionalGeneration        | 4  | 1.0008 |  0.9687   |      0.0       |     0.0     |  1.4298  |         1.4341         |
|            XLNetLMHeadModel             | 4  | 0.9997 |  0.9662   |      0.0       |     0.0     |  1.4338  |         1.4235         |
|       ElectraForQuestionAnswering       | 64 | 1.0004 |  0.9856   |      0.0       |     0.0     |  1.3582  |         1.3443         |
|       AlbertForQuestionAnswering        | 2  | 1.0011 |  1.0024   |      0.0       |     0.0     |  1.2968  |         1.2884         |
|            AlbertForMaskedLM            | 2  | 0.9984 |  1.0013   |      0.0       |     0.0     |  1.2904  |         1.2863         |
|            TrOCRForCausalLM             | 8  | 1.0177 |  0.9459   |     0.8088     |     0.0     |  1.2555  |         1.2853         |
|         Speech2Text2ForCausalLM         | 64 | 1.0042 |  0.9432   |     0.7366     |     0.0     |  1.2531  |         1.2828         |
|    LayoutLMForSequenceClassification    | 16 |  1.0   |  0.9885   |     0.7382     |     0.0     |  1.2524  |         1.2394         |
|           PegasusForCausalLM            | 8  | 1.014  |  0.9212   |     0.8235     |     0.0     |  1.2405  |         1.2246         |
|               GoogleFnet                | 1  | 1.0018 |   0.817   |     0.9941     |    1.117    |  1.9193  |         1.2034         |
|      BartForConditionalGeneration       | 1  | 1.016  |  0.9945   |      0.0       |     0.0     |  1.2881  |         1.1963         |
|     DistilBertForQuestionAnswering      | 32 | 1.032  |  0.9885   |     0.722      |     0.0     |  1.192   |         1.1848         |
|       DebertaForQuestionAnswering       | 4  | 0.9263 |  0.7387   |     0.9323     |     0.0     |  1.2979  |         1.1844         |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0135 |  0.9441   |     0.7582     |     0.0     |  1.1961  |         1.1774         |
|          DistilBertForMaskedLM          | 16 | 1.0236 |  0.9824   |     0.7573     |     0.0     |  1.1816  |         1.1765         |
|           LayoutLMForMaskedLM           | 16 | 1.0001 |  0.9693   |      0.0       |     0.0     |  1.1699  |         1.1763         |
|                 T5Small                 | 1  | 1.0266 |  0.9533   |      0.0       |     0.0     |  1.2799  |         1.1665         |
|            PLBartForCausalLM            | 16 | 1.0127 |  0.9512   |     0.7907     |     0.0     |  1.1304  |         1.1497         |
|             BartForCausalLM             | 2  | 0.9992 |  0.9665   |     0.7293     |     0.0     |  1.1075  |         1.1121         |
|       RobertaForQuestionAnswering       | 64 | 1.0001 |  0.9838   |     0.7409     |     0.0     |  1.089   |         1.0798         |
|        BertForQuestionAnswering         | 64 | 1.0004 |  0.9719   |     0.736      |     0.0     |  1.0932  |         1.0748         |
|           DebertaForMaskedLM            | 4  | 0.9354 |  0.8021   |     0.7343     |     0.0     |  1.0867  |         1.0674         |
|            MBartForCausalLM             | 16 | 1.0087 |  0.9641   |     0.7246     |     0.0     |  1.0582  |         1.064          |
|             BertForMaskedLM             | 64 |  1.0   |  0.9636   |     0.7181     |     0.0     |  1.0357  |         1.0402         |
|       BlenderbotSmallForCausalLM        | 64 | 1.001  |  0.9101   |     0.6521     |     0.0     |  1.0062  |         1.0381         |
|                 BigBird                 | 1  | 0.9927 |  0.9363   |     1.0063     |     0.0     |  1.1008  |         1.0055         |
|          AllenaiLongformerBase          | 0  |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+
|                  name                   | bs |    eager    |  aot_eager  | aot_cudagraphs | aot_nvfuser |  inductor   | inductor_no_cudagraphs |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+
|               GoogleFnet                | 1  |    pass     |    pass     |      pass      |    pass     |    pass     |          pass          |
|             BartForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             BertForMaskedLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|        BertForQuestionAnswering         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|                 BigBird                 | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|                CamemBert                | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|          DistilBertForMaskedLM          | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     DistilBertForQuestionAnswering      | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|               DistillGPT2               | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           ElectraForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       ElectraForQuestionAnswering       | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           LayoutLMForMaskedLM           | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|    LayoutLMForSequenceClassification    | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     M2M100ForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            MBartForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|         MegatronBertForCausalLM         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             OPTForCausalLM              | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            PLBartForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           PegasusForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     PegasusForConditionalGeneration     | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           RobertaForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       RobertaForQuestionAnswering       | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|         Speech2Text2ForCausalLM         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            AlbertForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       AlbertForQuestionAnswering        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      BartForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       DebertaForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      GPT2ForSequenceClassification      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       MT5ForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|          MobileBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|     MobileBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       T5ForConditionalGeneration        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|                 T5Small                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            TrOCRForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             XGLMForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            XLNetLMHeadModel             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            YituTechConvBert             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|           DebertaForMaskedLM            | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |          pass          |
|     PLBartForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run | fail_to_run |      fail_to_run       |
|      MBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
|          AllenaiLongformerBase          | 1  | fail_to_run | fail_to_run |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|            XLNetLMHeadModel             | 4  | 3.8563 |  20.1291  |      nan       |     nan     | 154.9396 |        55.0385         |
|          MobileBertForMaskedLM          | 16 | 7.9926 |  27.645   |      nan       |     nan     | 95.1586  |        53.2011         |
|     MobileBertForQuestionAnswering      | 32 | 7.778  |  27.6918  |      nan       |     nan     |  80.211  |         53.144         |
|     M2M100ForConditionalGeneration      | 2  | 2.6347 |  15.4058  |    25.0478     |     nan     | 88.6459  |        41.5232         |
|      MBartForConditionalGeneration      | 8  | 2.842  |  14.9887  |    24.0733     |     nan     | 57.7763  |        38.9562         |
|     PegasusForConditionalGeneration     | 4  | 2.7425 |  14.5211  |    23.2205     |     nan     | 74.3495  |        38.7677         |
|         MegatronBertForCausalLM         | 2  | 3.3214 |  12.9395  |    18.8926     |     nan     | 66.4161  |         37.599         |
|    MegatronBertForQuestionAnswering     | 8  | 3.1717 |  12.6607  |    18.6026     |     nan     | 55.4472  |        36.7751         |
|             XGLMForCausalLM             | 1  | 2.2243 |  11.9095  |      nan       |     nan     | 62.0082  |        36.4739         |
|      BartForConditionalGeneration       | 1  | 2.8137 |  14.8852  |      nan       |     nan     | 46.9336  |        36.3428         |
|       MT5ForConditionalGeneration       | 2  | 3.2756 |  13.4151  |      nan       |     nan     | 104.5949 |        32.0014         |
|           DebertaForMaskedLM            | 4  | 4.7627 |  10.9781  |    44.9229     |     nan     | 120.725  |        31.1942         |
|       DebertaForQuestionAnswering       | 4  | 4.7076 |  11.2242  |     44.558     |     nan     | 94.8567  |        29.4266         |
| BlenderbotSmallForConditionalGeneration | 32 | 1.7185 |  9.5893   |    14.6735     |     nan     | 50.7147  |        25.7547         |
|            YituTechConvBert             | 1  | 2.0965 |  9.6719   |      nan       |     nan     | 115.6858 |        25.4287         |
|       T5ForConditionalGeneration        | 4  | 1.9998 |  8.9302   |      nan       |     nan     | 58.5377  |        24.9269         |
|     PLBartForConditionalGeneration      | 8  | 1.3877 |  7.8161   |    11.2448     |     nan     | 57.5267  |        24.1229         |
|                 BigBird                 | 1  | 7.1363 |  13.4667  |    28.6926     |     nan     |  41.151  |        24.0747         |
|                 T5Small                 | 1  | 2.0038 |  9.0658   |      nan       |     nan     | 54.3517  |        20.5314         |
|           RobertaForCausalLM            | 4  | 1.4215 |  6.2233   |     8.733      |     nan     | 59.7399  |        19.1079         |
|       ElectraForQuestionAnswering       | 64 | 1.3196 |  6.2054   |      nan       |     nan     | 32.0659  |        18.9681         |
|           LayoutLMForMaskedLM           | 16 | 1.4595 |  6.4635   |      nan       |     nan     | 32.0362  |        18.9255         |
|             BertForMaskedLM             | 64 | 1.3182 |  6.1585   |     9.3256     |     nan     | 32.4413  |        18.8293         |
|           ElectraForCausalLM            | 1  | 1.4112 |  6.3391   |     8.6323     |     nan     | 21.4781  |        18.4255         |
|                CamemBert                | 1  | 1.3998 |   6.083   |     8.8757     |     nan     |  21.526  |        17.9811         |
|    LayoutLMForSequenceClassification    | 16 | 1.4734 |  6.4139   |     9.7085     |     nan     | 49.8072  |        17.9259         |
|      GPT2ForSequenceClassification      | 4  | 1.2966 |   5.902   |      nan       |     nan     | 31.4584  |        17.5774         |
|       RobertaForQuestionAnswering       | 64 | 1.361  |  6.2194   |     8.907      |     nan     | 19.7272  |        16.9387         |
|        BertForQuestionAnswering         | 64 | 1.3724 |  6.1077   |     9.0325     |     nan     | 19.9996  |        16.7182         |
|           PegasusForCausalLM            | 8  | 1.0348 |  5.5356   |     8.5779     |     nan     | 40.7253  |         16.453         |
|            MBartForCausalLM             | 16 | 0.9629 |  5.5287   |     8.2172     |     nan     | 29.3925  |        16.1161         |
|             OPTForCausalLM              | 4  | 1.0786 |   5.867   |     13.239     |     nan     | 35.6128  |        15.8746         |
|            TrOCRForCausalLM             | 8  | 1.0393 |  5.7252   |     7.8765     |     nan     | 21.9189  |         15.303         |
|             BartForCausalLM             | 2  | 1.0117 |  5.5461   |     8.2662     |     nan     | 28.6964  |        15.2956         |
|       AlbertForQuestionAnswering        | 2  | 1.2416 |   5.819   |      nan       |     nan     |  16.702  |        13.6368         |
|            AlbertForMaskedLM            | 2  | 1.1347 |  5.7649   |      nan       |     nan     | 28.2303  |        13.5234         |
|               GoogleFnet                | 1  | 0.7819 |  3.1961   |    10.0079     |   9.4636    | 23.6876  |        11.9908         |
|       BlenderbotSmallForCausalLM        | 64 | 0.6073 |   3.708   |     5.7262     |     nan     | 23.1824  |        11.4275         |
|         Speech2Text2ForCausalLM         | 64 | 0.5623 |  3.0034   |     4.5045     |     nan     | 24.8957  |        10.2719         |
|            PLBartForCausalLM            | 16 | 0.5053 |  2.9211   |     4.2612     |     nan     | 23.9939  |         9.922          |
|     DistilBertForQuestionAnswering      | 32 | 0.4751 |  2.9444   |     6.1878     |     nan     | 30.3674  |         9.8744         |
|          DistilBertForMaskedLM          | 16 | 0.4653 |  2.9409   |     5.7567     |     nan     | 20.4187  |         9.7413         |
|               DistillGPT2               | 1  | 0.6851 |  2.9742   |     4.3883     |     nan     |  34.166  |         9.4494         |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|       DebertaForQuestionAnswering       | 4  | 0.9845 |  1.0525   |     0.3309     |     nan     |  0.3569  |          1.13          |
|               GoogleFnet                | 1  | 0.9983 |  0.9453   |     0.3714     |   1.0813    |  0.7687  |         1.1065         |
|       T5ForConditionalGeneration        | 4  |  1.0   |  0.9597   |      nan       |     nan     |  0.8215  |         1.1049         |
|      GPT2ForSequenceClassification      | 4  | 0.9343 |  0.9093   |      nan       |     nan     |  1.0318  |         1.0912         |
|                 T5Small                 | 1  |  1.0   |  0.9325   |      nan       |     nan     |  0.8564  |         1.087          |
|                 BigBird                 | 1  | 0.999  |  0.9542   |     0.4213     |     nan     |  0.822   |         1.062          |
|           DebertaForMaskedLM            | 4  |  1.0   |  0.9851   |     0.3554     |     nan     |  0.4265  |         1.0346         |
|     DistilBertForQuestionAnswering      | 32 |  1.0   |  0.9046   |     0.3328     |     nan     |  0.8394  |         1.0048         |
|     M2M100ForConditionalGeneration      | 2  | 0.9977 |  0.9857   |     0.4249     |     nan     |  0.7197  |         1.0045         |
|    LayoutLMForSequenceClassification    | 16 |  1.0   |  0.9348   |     0.3324     |     nan     |  0.9339  |         1.004          |
|        BertForQuestionAnswering         | 64 |  1.0   |  0.9467   |     0.332      |     nan     |  0.9354  |         1.0032         |
|       RobertaForQuestionAnswering       | 64 |  1.0   |  0.9467   |     0.3319     |     nan     |  0.9354  |         1.0032         |
|       ElectraForQuestionAnswering       | 64 |  1.0   |  0.9524   |      nan       |     nan     |  0.9361  |         1.0025         |
|             XGLMForCausalLM             | 1  | 0.9974 |  0.9999   |      nan       |     nan     |  0.8528  |         0.9999         |
|               DistillGPT2               | 1  | 0.9984 |  0.7704   |     0.3571     |     nan     |  0.8184  |         0.9933         |
|     PegasusForConditionalGeneration     | 4  | 0.9993 |  0.9002   |     0.3809     |     nan     |  0.7318  |         0.9895         |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8465   |      nan       |     nan     |  0.8244  |         0.9819         |
|            XLNetLMHeadModel             | 4  | 1.0001 |  0.8976   |      nan       |     nan     |  0.9717  |         0.9807         |
|            YituTechConvBert             | 1  | 0.9858 |  0.7923   |      nan       |     nan     |  0.8025  |         0.9784         |
|                CamemBert                | 1  | 0.998  |  0.7977   |     0.3504     |     nan     |  0.8088  |         0.9708         |
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.9369   |      nan       |     nan     |  0.6763  |         0.9674         |
|           PegasusForCausalLM            | 8  | 0.9778 |  0.9323   |     0.4075     |     nan     |  0.802   |         0.9625         |
|     PLBartForConditionalGeneration      | 8  |  1.0   |  0.8221   |     0.3314     |     nan     |  0.7548  |         0.9608         |
|            AlbertForMaskedLM            | 2  | 0.9999 |  0.9172   |      nan       |     nan     |  0.6633  |         0.9567         |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.8048   |     0.3624     |     nan     |  0.7873  |         0.9427         |
|      MBartForConditionalGeneration      | 8  |  1.0   |  0.8136   |     0.342      |     nan     |  0.7949  |         0.9411         |
|           LayoutLMForMaskedLM           | 16 |  1.0   |  0.9409   |      nan       |     nan     |  0.888   |         0.9409         |
|             BartForCausalLM             | 2  |  1.0   |  0.8847   |     0.3484     |     nan     |  0.8389  |         0.9329         |
|    MegatronBertForQuestionAnswering     | 8  | 0.923  |  0.8265   |     0.3609     |     nan     |  0.7975  |         0.923          |
|             BertForMaskedLM             | 64 |  1.0   |  0.9219   |     0.3433     |     nan     |  0.8321  |         0.922          |
|            MBartForCausalLM             | 16 |  1.0   |  0.8629   |     0.352      |     nan     |  0.8181  |         0.9194         |
|          DistilBertForMaskedLM          | 16 | 0.9998 |  0.9138   |     0.3377     |     nan     |  0.8055  |         0.9137         |
| BlenderbotSmallForConditionalGeneration | 32 |  1.0   |  0.9036   |     0.3443     |     nan     |  0.7612  |         0.913          |
|             OPTForCausalLM              | 4  | 0.9979 |  0.7508   |     0.3322     |     nan     |  0.763   |         0.9125         |
|           ElectraForCausalLM            | 1  |  1.0   |  0.9107   |     0.3556     |     nan     |  0.6123  |         0.9107         |
|           RobertaForCausalLM            | 4  | 0.9058 |  0.7778   |     0.3513     |     nan     |  0.7882  |         0.9058         |
|            PLBartForCausalLM            | 16 |  1.0   |  0.8805   |     0.3568     |     nan     |  0.8028  |         0.9029         |
|         Speech2Text2ForCausalLM         | 64 | 0.9565 |  0.8462   |     0.3538     |     nan     |  0.7768  |         0.889          |
|       BlenderbotSmallForCausalLM        | 64 |  1.0   |  0.8401   |     0.3578     |     nan     |  0.7277  |         0.8452         |
|          MobileBertForMaskedLM          | 16 | 0.9997 |  0.9179   |      nan       |     nan     |  0.5861  |         0.8035         |
|         MegatronBertForCausalLM         | 2  | 0.7066 |  0.7066   |     0.3654     |     nan     |  0.7066  |         0.7066         |
|       MT5ForConditionalGeneration       | 2  | 0.6173 |  0.6173   |      nan       |     nan     |  0.6173  |         0.6173         |
|     MobileBertForQuestionAnswering      | 32 |  1.0   |  0.9716   |      nan       |     nan     |  0.4668  |         0.6097         |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|          ghostnet_100           | 128 | 0.9994 |  0.9955   |     0.8411     |   1.2485    |  1.7846  |         1.7371         |
|            lcnet_050            | 128 | 0.9569 |  0.9502   |     0.7668     |   1.4959    |  1.6591  |         1.6188         |
|         coat_lite_mini          | 128 | 0.9999 |  0.9967   |     0.8443     |   1.0555    |  1.6093  |         1.6033         |
|        tnt_s_patch16_224        | 64  | 0.9999 |  0.9984   |      0.0       |   1.5989    |  1.5295  |         1.5135         |
|           dm_nfnet_f0           | 128 | 0.9994 |    1.0    |      0.0       |   1.2117    |  1.4719  |         1.4226         |
|        twins_pcpvt_base         | 32  | 1.0074 |  0.9883   |     0.9602     |   1.3673    |  1.4916  |         1.4143         |
|      xcit_large_24_p8_224       |  5  | 1.0026 |  0.9913   |      0.0       |     0.0     |  1.4488  |         1.4108         |
|         crossvit_9_240          | 64  | 1.0061 |  1.0046   |      0.0       |   1.0509    |  1.4149  |         1.3907         |
|           volo_d1_224           | 64  |  1.0   |  0.9962   |      0.0       |   1.1313    |  1.3925  |         1.3679         |
|         mobilenetv2_100         | 128 | 0.9662 |   0.964   |     0.7057     |   1.0098    |  1.3132  |         1.3536         |
|      mobilenetv3_large_100      | 128 | 0.9665 |   0.963   |     0.7637     |   1.1669    |  1.3366  |         1.3474         |
|           regnety_002           | 128 | 0.9796 |  0.9859   |     0.8561     |    1.363    |  1.4971  |         1.3317         |
|             dla102              | 64  | 0.9998 |  0.9962   |     0.8017     |   1.2856    |  1.3449  |         1.3235         |
|          gmixer_24_224          | 64  |  1.0   |  0.8414   |      0.0       |   0.9868    |  1.3497  |         1.3184         |
|           resnest101e           | 32  | 1.0041 |  1.0368   |      0.78      |   1.2022    |  1.3703  |         1.3125         |
|            nfnet_l0             | 64  | 0.9994 |  0.7985   |     0.6964     |    1.049    |  1.3728  |         1.3091         |
|        adv_inception_v3         | 128 | 0.9999 |  0.9974   |      0.0       |   1.1251    |  1.3265  |         1.3083         |
|          inception_v3           | 128 |  1.0   |  0.9989   |      0.0       |   1.1253    |  1.3272  |         1.3063         |
|       gluon_inception_v3        | 128 | 0.9999 |  0.9992   |      0.0       |   1.1249    |  1.327   |         1.3057         |
|            hrnet_w18            |  2  | 1.0089 |  1.1011   |     2.0045     |    1.482    |  4.8677  |         1.3036         |
|            fbnetv3_b            | 128 | 0.965  |  0.9579   |     0.7605     |   1.1304    |  1.2831  |         1.2958         |
|           mnasnet_100           | 128 | 0.9671 |  0.9616   |     0.7862     |   1.1562    |  1.2632  |         1.2813         |
|        sebotnet33ts_256         | 64  | 0.9765 |  0.8072   |      0.0       |   1.0534    |  1.2639  |         1.2697         |
|       tf_efficientnet_b0        | 128 | 0.9771 |  0.7837   |      0.0       |   0.9852    |  1.2599  |         1.2668         |
|          botnet26t_256          | 128 | 0.9859 |  0.9854   |     0.7842     |   1.2265    |  1.2641  |         1.2629         |
|           fbnetc_100            | 128 | 0.9665 |  0.9632   |     0.7903     |   1.1875    |  1.2504  |         1.2627         |
|          spnasnet_100           | 128 | 0.9622 |  0.9551   |     0.7732     |    1.132    |  1.2355  |         1.2529         |
|        res2net50_14w_8s         |  2  | 1.0019 |  1.0164   |     2.0615     |   1.4393    |  5.4203  |         1.2519         |
|          jx_nest_base           | 32  | 0.9999 |  0.9947   |      0.0       |   1.2103    |  1.2754  |         1.2519         |
|          cspdarknet53           | 64  | 0.9579 |  0.9539   |     0.7351     |   1.1851    |  1.2264  |         1.2348         |
|           res2next50            |  2  | 1.004  |  1.0434   |     2.2471     |   1.3813    |  4.6815  |         1.2326         |
|           selecsls42b           | 128 | 1.0001 |  0.9967   |     0.8147     |   1.2083    |  1.2453  |         1.231          |
|        ese_vovnet19b_dw         | 128 | 0.9795 |  0.9766   |     0.7421     |    1.145    |  1.2239  |         1.2267         |
|           rexnet_100            | 128 | 0.9731 |  0.8163   |      0.0       |   0.9831    |  1.2133  |         1.219          |
|            pit_b_224            | 64  | 1.0001 |  0.9987   |      0.0       |   1.0539    |  1.2269  |         1.2159         |
|       eca_botnext26ts_256       | 64  | 0.9745 |  0.7701   |     0.6216     |   1.0176    |  1.2397  |         1.2148         |
|        eca_halonext26ts         | 64  | 0.9737 |  0.7751   |     0.6286     |    1.017    |  1.2316  |         1.2133         |
|            tinynet_a            | 128 | 0.9663 |  0.7751   |     0.6201     |   0.9716    |  1.1908  |         1.1995         |
|           mobilevit_s           | 32  | 0.9761 |  0.7692   |     0.5952     |   0.9701    |  1.1937  |         1.1987         |
|          pnasnet5large          | 16  | 0.9996 |  0.9984   |      0.0       |   1.0835    |  1.2099  |         1.1943         |
|             dpn107              | 32  | 0.9588 |  0.9512   |     0.7798     |   1.0297    |  1.1798  |         1.1934         |
|        res2net101_26w_4s        | 64  | 0.9998 |  0.9971   |     0.7704     |   1.1677    |  1.2288  |         1.1893         |
|           convit_base           | 32  | 0.9996 |  0.9957   |      0.0       |   1.1925    |  1.2487  |         1.1868         |
|            repvgg_a2            | 128 | 0.9644 |  0.9625   |     0.8269     |   1.1224    |  1.1697  |         1.1663         |
|          cait_m36_384           |  2  | 0.9998 |  0.9945   |      0.0       |   1.0967    |  1.205   |         1.1567         |
|          convnext_base          | 32  | 0.9999 |  0.9978   |      0.0       |   1.0457    |  1.1874  |         1.1521         |
|         poolformer_m36          | 64  |  1.0   |   0.999   |      0.0       |     0.0     |  1.1679  |         1.1469         |
|           tf_mixnet_l           | 64  | 0.9719 |  0.8764   |     0.7242     |   1.0063    |  1.143   |         1.1456         |
|  swin_base_patch4_window7_224   | 64  |  1.0   |  0.9795   |      0.0       |   0.9928    |  1.1439  |         1.1346         |
|            mixnet_l             | 64  | 0.9714 |  0.8725   |     0.7118     |   1.0065    |  1.1316  |         1.1289         |
|      beit_base_patch16_224      | 64  |  1.0   |  0.9821   |      0.0       |   0.9529    |  1.1189  |         1.1074         |
|          gmlp_s16_224           | 64  |  1.0   |  0.9921   |      0.0       |   0.9955    |  1.1041  |          1.09          |
| deit_base_distilled_patch16_224 | 64  | 0.9996 |  0.9987   |     0.7703     |   1.0129    |  1.0985  |         1.0865         |
|      vit_base_patch16_224       | 64  |  1.0   |  0.9985   |     0.7705     |   0.9738    |  1.0899  |         1.0785         |
|        convmixer_768_32         | 32  | 0.9999 |  1.0001   |      0.0       |   1.0621    |  1.0772  |         1.0747         |
|        gluon_xception65         | 32  | 0.9999 |  0.9965   |      0.0       |   1.0408    |  1.0876  |         1.0725         |
|     swsl_resnext101_32x16d      | 32  | 0.9998 |  1.0001   |      0.0       |   1.1075    |  1.1072  |         1.0717         |
|            gernet_l             | 128 | 0.9747 |  0.9727   |     0.8215     |   1.0985    |  1.0764  |         1.0706         |
|          mixer_b16_224          | 64  | 1.0001 |  0.9983   |     0.7607     |   0.9786    |  1.0637  |         1.0319         |
|         visformer_small         | 128 | 1.0001 |  1.0027   |     0.7976     |   1.0215    |  1.0485  |         1.0136         |
|          resmlp_12_224          | 128 | 1.0005 |  1.0001   |     0.6958     |     0.0     |  1.0159  |         0.9953         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+
|        adv_inception_v3         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          botnet26t_256          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          cspdarknet53           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|             dla102              | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|             dpn107              | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        eca_halonext26ts         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            gernet_l             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          ghostnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            hrnet_w18            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          inception_v3           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            lcnet_050            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            mixnet_l             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         mobilenetv2_100         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           mobilevit_s           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            nfnet_l0             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          pnasnet5large          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           regnety_002           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           res2next50            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           rexnet_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        sebotnet33ts_256         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           selecsls42b           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            tinynet_a            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         visformer_small         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          gmixer_24_224          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          gmlp_s16_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |          pass          |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|      xcit_large_24_p8_224       | 2  | pass  | fail_accuracy |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|        gluon_xception65         | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|         poolformer_m36          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|         coat_lite_mini          | 2  | pass  | fail_accuracy | fail_accuracy  | fail_accuracy |     pass      |          pass          |
|            pit_b_224            | 2  | pass  | fail_accuracy | fail_accuracy  | fail_accuracy |     pass      |          pass          |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy | fail_accuracy  | fail_accuracy |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |      pass      |     pass      |     pass      |     fail_accuracy      |
|            fbnetv3_b            | 2  | pass  |     pass      |      pass      |     pass      | fail_accuracy |     fail_accuracy      |
|           resnest101e           | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|            hrnet_w18            |  2  | 5.5514 |  31.4987  |    52.5308     |  194.6411   | 99.5082  |        81.9802         |
|          pnasnet5large          | 16  | 4.2069 |  21.6242  |      nan       |  122.4201   | 71.2949  |        67.2041         |
|           mobilevit_s           | 32  | 1.6977 |   7.171   |     14.752     |   41.4298   | 341.3652 |        62.3666         |
|      xcit_large_24_p8_224       |  5  | 2.5437 |  16.6772  |      nan       |     nan     | 172.8777 |        57.0318         |
|        twins_pcpvt_base         | 32  | 2.1953 |  13.2558  |    22.4707     |   45.1882   | 507.7757 |        54.2039         |
|          cait_m36_384           |  2  | 2.7258 |  17.8566  |      nan       |   45.7956   | 164.8171 |        50.2666         |
|        res2net101_26w_4s        | 64  | 2.6877 |  16.0726  |    26.8113     |   80.9679   | 54.4775  |        49.5062         |
|  swin_base_patch4_window7_224   | 64  | 2.5935 |  12.3844  |      nan       |   58.733    | 177.294  |        45.0007         |
|           resnest101e           | 32  | 2.8543 |  15.8323  |     25.573     |   74.2882   | 111.9496 |        44.8647         |
|         poolformer_m36          | 64  | 1.7189 |  9.1069   |      nan       |     nan     |  46.021  |        43.0803         |
|        res2net50_14w_8s         |  2  | 2.5392 |  14.5878  |    23.6502     |   68.3297   | 45.8749  |        41.8925         |
|          convnext_base          | 32  | 1.2758 |  6.1953   |      nan       |   20.9357   | 202.7457 |        38.1635         |
|             dpn107              | 32  | 3.9591 |  13.8539  |    45.5921     |   76.0043   | 40.3708  |        36.3608         |
|          jx_nest_base           | 32  | 1.6763 |   9.356   |      nan       |   58.9394   | 135.9959 |         33.881         |
|            fbnetv3_b            | 128 | 2.9517 |  10.5374  |    30.2312     |   77.0478   | 35.3708  |        31.9534         |
|        adv_inception_v3         | 128 | 1.4958 |  8.6965   |      nan       |   67.2085   | 34.0825  |        31.6169         |
|       gluon_inception_v3        | 128 | 1.5314 |  8.5064   |      nan       |   66.9439   |  33.61   |        30.8318         |
|        gluon_xception65         | 32  | 1.5286 |  10.1092  |      nan       |   40.8043   |  32.654  |        30.0885         |
|          inception_v3           | 128 | 1.4983 |  8.2858   |      nan       |   66.6613   | 33.5144  |        29.7959         |
|           tf_mixnet_l           | 64  | 5.767  |  12.5234  |    27.2068     |   61.5578   | 33.8544  |        29.6373         |
|          ghostnet_100           | 128 | 2.4536 |  8.8736   |    13.0483     |   58.6961   | 32.3006  |        29.3573         |
|             dla102              | 64  | 1.5262 |  9.7746   |    14.3492     |   61.3674   | 32.7156  |        29.0289         |
|            mixnet_l             | 64  | 5.297  |  12.0806  |    27.5547     |   60.6602   |  31.97   |        28.5199         |
|        tnt_s_patch16_224        | 64  | 1.6033 |  10.3956  |      nan       |   23.6034   | 63.9747  |        28.1925         |
|           dm_nfnet_f0           | 128 | 1.9599 |  7.3416   |      nan       |   29.5113   | 28.1901  |         25.49          |
|           volo_d1_224           | 64  | 1.2527 |  7.5688   |      nan       |   28.5353   | 74.6738  |        25.3803         |
|          gmlp_s16_224           | 64  | 1.0091 |   6.239   |      nan       |   13.523    | 96.9474  |        25.2203         |
|           res2next50            |  2  | 1.5169 |  8.0974   |     12.088     |   41.4751   | 26.7041  |        24.3022         |
|        eca_halonext26ts         | 64  | 1.3714 |  5.2827   |    11.2345     |   50.3344   |  262.43  |        24.1671         |
|     swsl_resnext101_32x16d      | 32  | 1.6673 |   9.166   |      nan       |   38.5961   |  26.624  |        24.1502         |
|           rexnet_100            | 128 | 1.8333 |  7.0955   |      nan       |  101.6305   | 26.3318  |        23.9861         |
|        sebotnet33ts_256         | 64  | 1.6193 |  6.2319   |      nan       |   50.6361   | 132.0687 |         23.882         |
|         coat_lite_mini          | 128 | 0.9143 |  5.3518   |     7.7361     |   14.5681   | 364.9581 |        23.8478         |
|         crossvit_9_240          | 64  | 1.3631 |   8.165   |      nan       |   26.4481   | 159.6489 |        23.2843         |
|            tinynet_a            | 128 | 2.0295 |  7.5769   |    19.9529     |   60.7375   | 25.5819  |        22.6985         |
|       tf_efficientnet_b0        | 128 | 1.7698 |  6.7346   |      nan       |   60.6858   | 22.6338  |         20.872         |
|          cspdarknet53           | 64  | 2.2005 |  7.5986   |    19.8213     |   48.6604   | 22.7589  |         20.778         |
|          gmixer_24_224          | 64  | 1.2353 |  7.2435   |      nan       |   16.3341   | 71.6232  |        20.4959         |
|       eca_botnext26ts_256       | 64  | 1.3287 |  4.7948   |    10.7476     |   48.2999   | 290.2904 |        19.8298         |
|           fbnetc_100            | 128 | 1.9158 |  6.5018   |    18.4877     |   45.035    | 21.5277  |        19.3838         |
|          spnasnet_100           | 128 | 1.9538 |  6.5388   |    17.2195     |   44.0026   | 21.5531  |         18.809         |
|            nfnet_l0             | 64  | 1.6932 |  7.1748   |    10.8573     |   27.1068   | 24.2819  |        18.6725         |
|          botnet26t_256          | 128 | 1.3044 |  4.5699   |    10.9116     |   40.9092   | 105.3566 |        18.3653         |
|           convit_base           | 32  | 0.9673 |  6.1172   |      nan       |   18.0735   | 84.0672  |        18.0342         |
|      mobilenetv3_large_100      | 128 | 1.5015 |   5.219   |    12.9894     |   64.1074   | 19.7471  |        17.3543         |
|         mobilenetv2_100         | 128 | 1.6053 |  5.0752   |    13.1196     |   37.684    |  19.109  |        16.1893         |
|           mnasnet_100           | 128 | 1.5679 |  5.5386   |    13.3421     |   37.6931   | 18.4296  |        15.7992         |
|            gernet_l             | 128 | 1.9688 |  6.1371   |    16.0066     |   35.7462   | 18.0897  |        15.7917         |
|           regnety_002           | 128 | 1.5589 |  5.5672   |    13.4735     |   47.1773   | 17.6702  |        15.7589         |
|            repvgg_a2            | 128 | 1.9613 |  5.9791   |    15.6637     |   43.6358   | 17.5441  |        15.7292         |
|        convmixer_768_32         | 32  | 1.1394 |  5.9151   |      nan       |   13.3328   | 19.8959  |        15.7286         |
|      beit_base_patch16_224      | 64  | 1.0517 |  5.1695   |      nan       |   13.9188   | 32.1445  |        15.4759         |
|          resmlp_12_224          | 128 | 0.5144 |  2.7602   |     5.4746     |     nan     | 38.6184  |        14.8683         |
|         visformer_small         | 128 | 0.8595 |  4.1596   |     5.8685     |   23.9823   | 70.7521  |        14.6137         |
|           selecsls42b           | 128 | 0.6354 |  3.8347   |     5.5048     |   39.064    | 16.4175  |        14.4533         |
|            pit_b_224            | 64  | 0.9357 |  4.9349   |      nan       |   12.4917   | 67.5584  |        14.3303         |
| deit_base_distilled_patch16_224 | 64  | 0.7137 |  4.6555   |     6.4036     |   10.2768   | 36.0881  |         13.538         |
|      vit_base_patch16_224       | 64  | 0.7021 |  4.1004   |     6.3604     |   9.4905    | 25.0169  |        12.8291         |
|          mixer_b16_224          | 64  | 0.5101 |  3.0639   |     5.353      |   10.5805   | 41.5714  |        12.5959         |
|        ese_vovnet19b_dw         | 128 | 0.9625 |  3.1605   |     7.5135     |   30.8903   | 13.0066  |        11.8343         |
|            lcnet_050            | 128 | 0.9262 |  3.1307   |     6.8576     |   30.9579   | 13.0753  |        11.0715         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|          gmixer_24_224          | 64  | 0.9952 |  0.9645   |      nan       |   0.9825    |  1.3808  |         1.5001         |
|            tinynet_a            | 128 | 0.9942 |  0.7796   |     0.2617     |   0.7823    |  1.351   |         1.3692         |
|          pnasnet5large          | 16  | 1.069  |   1.011   |      nan       |   1.2062    |  1.1774  |         1.3282         |
|            nfnet_l0             | 64  | 0.9948 |  0.8256   |     0.2664     |    0.813    |  1.2558  |         1.3209         |
|           rexnet_100            | 128 | 0.9935 |  0.7843   |      nan       |   0.8682    |  1.2619  |         1.2765         |
|           convit_base           | 32  | 0.9977 |  0.8861   |      nan       |   0.9501    |  1.068   |         1.2569         |
|        sebotnet33ts_256         | 64  | 0.9952 |  0.7084   |      nan       |   0.6831    |  0.841   |         1.2472         |
|       eca_botnext26ts_256       | 64  | 0.9938 |  0.7669   |     0.258      |   0.7642    |  1.1318  |         1.2041         |
|        eca_halonext26ts         | 64  | 0.9938 |   0.768   |     0.2589     |   0.7694    |  1.1317  |         1.2034         |
|       tf_efficientnet_b0        | 128 | 0.9935 |  0.7688   |      nan       |   0.8401    |  1.1889  |         1.199          |
|           mobilevit_s           | 32  | 0.9959 |  0.7668   |     0.258      |    0.741    |  1.141   |         1.1989         |
|          cait_m36_384           |  2  | 0.9998 |   0.902   |      nan       |   0.9203    |  1.011   |         1.139          |
|         mobilenetv2_100         | 128 | 0.9925 |  0.7621   |     0.3063     |   0.7635    |  1.1003  |         1.1104         |
|          ghostnet_100           | 128 | 0.9865 |  0.8768   |     0.3273     |   0.9345    |  1.0353  |         1.0963         |
|           tf_mixnet_l           | 64  | 0.9956 |  0.8577   |     0.2851     |   0.8572    |  0.9695  |         1.0815         |
|         poolformer_m36          | 64  | 0.998  |  0.9512   |      nan       |     nan     |  1.0527  |         1.069          |
|             dla102              | 64  | 0.9841 |  0.9148   |     0.3339     |   0.9504    |  1.0492  |         1.0544         |
|           dm_nfnet_f0           | 128 | 0.9358 |  0.8936   |      nan       |   0.9479    |  1.0219  |         1.0495         |
|           selecsls42b           | 128 | 0.9883 |  0.8896   |     0.337      |   0.8954    |  0.9913  |         1.0324         |
|      xcit_large_24_p8_224       |  5  | 0.9981 |  0.9194   |      nan       |     nan     |  0.9124  |         1.0084         |
|            mixnet_l             | 64  | 0.995  |  0.8449   |     0.2684     |   0.7907    |  0.8995  |         1.0059         |
|      beit_base_patch16_224      | 64  | 0.9966 |  0.9545   |      nan       |   0.8606    |  0.9272  |         1.0027         |
|        tnt_s_patch16_224        | 64  | 0.9963 |  0.9715   |      nan       |   0.8518    |  0.9131  |         1.0027         |
|           resnest101e           | 32  | 0.9972 |  0.9434   |     0.3271     |   0.9425    |  0.9914  |         1.002          |
|          mixer_b16_224          | 64  | 0.9956 |  0.9574   |     0.3405     |   0.8644    |  0.9357  |         0.997          |
|        ese_vovnet19b_dw         | 128 | 0.9923 |  0.8877   |     0.3261     |   0.9302    |  0.9886  |         0.9967         |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9434   |     0.3153     |   0.8229    |  0.915   |         0.9937         |
|            pit_b_224            | 64  | 0.9968 |  0.7947   |      nan       |   0.6417    |  0.792   |         0.9913         |
|          resmlp_12_224          | 128 | 0.9893 |   0.943   |     0.2472     |     nan     |  0.8169  |         0.9911         |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9442   |     0.3138     |   0.8242    |  0.9095  |         0.9895         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9288   |      nan       |    0.83     |  0.8585  |         0.989          |
|         crossvit_9_240          | 64  | 0.9886 |  0.8633   |      nan       |    0.729    |  0.8063  |         0.9877         |
|        twins_pcpvt_base         | 32  | 0.9971 |  0.9101   |     0.3178     |   0.8351    |  0.8722  |         0.9876         |
|        convmixer_768_32         | 32  | 0.9986 |  0.9854   |      nan       |   0.9793    |  0.9836  |         0.9853         |
|          gmlp_s16_224           | 64  | 0.9958 |  0.9727   |      nan       |    0.966    |  0.9267  |         0.9838         |
|         coat_lite_mini          | 128 | 1.0049 |  0.8777   |     0.3262     |   0.7873    |  0.7899  |         0.9836         |
|            fbnetv3_b            | 128 | 0.9932 |  0.7828   |     0.3095     |    0.784    |  0.9696  |         0.977          |
|          jx_nest_base           | 32  | 1.0002 |  0.8966   |      nan       |   0.7112    |  0.8575  |         0.9712         |
|            hrnet_w18            |  2  | 0.9947 |  0.8779   |     0.4003     |   0.8833    |  0.6657  |         0.9689         |
|          convnext_base          | 32  | 0.998  |  0.9059   |      nan       |   0.7678    |  0.8761  |         0.9606         |
|             dpn107              | 32  | 0.9985 |  0.9271   |     0.3392     |   0.8941    |  0.9056  |         0.9562         |
|        res2net101_26w_4s        | 64  | 0.9968 |  0.9278   |     0.3243     |   0.8932    |  0.9269  |         0.9548         |
|         visformer_small         | 128 | 0.9943 |  0.9381   |     0.3293     |   0.9475    |  0.9005  |         0.951          |
|        gluon_xception65         | 32  | 0.9975 |  0.9365   |      nan       |   0.8982    |  0.9351  |         0.9376         |
|        res2net50_14w_8s         |  2  | 0.9976 |   0.837   |     0.3866     |   0.8458    |  0.8293  |         0.9317         |
|           res2next50            |  2  | 0.9972 |  0.8331   |     0.3813     |    0.841    |   0.82   |         0.9281         |
|     swsl_resnext101_32x16d      | 32  | 0.9991 |  0.8972   |      nan       |   0.8675    |  0.8931  |         0.9249         |
|            lcnet_050            | 128 | 0.9672 |  0.7521   |     0.3171     |   0.7524    |  0.8921  |         0.923          |
|           volo_d1_224           | 64  | 0.996  |  0.9213   |      nan       |   0.7472    |  0.9124  |         0.9171         |
|          spnasnet_100           | 128 | 0.989  |  0.9109   |     0.3309     |   0.8412    |  0.9047  |         0.9157         |
|      mobilenetv3_large_100      | 128 | 0.9876 |  0.8589   |     0.3244     |   0.8745    |  0.9007  |         0.9126         |
|           mnasnet_100           | 128 | 0.9877 |  0.9019   |     0.3306     |   0.8279    |  0.8961  |         0.9077         |
|        adv_inception_v3         | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|       gluon_inception_v3        | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|          inception_v3           | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|           regnety_002           | 128 | 0.9717 |  0.8104   |     0.3283     |   0.7599    |  0.8617  |         0.8993         |
|          cspdarknet53           | 64  | 0.9954 |  0.8528   |     0.316      |   0.8762    |  0.8835  |         0.8875         |
|          botnet26t_256          | 128 | 0.9915 |  0.8434   |     0.3165     |    0.745    |  0.8605  |         0.8702         |
|           fbnetc_100            | 128 | 0.9891 |  0.8518   |     0.3236     |   0.7446    |  0.8416  |         0.8498         |
|            gernet_l             | 128 | 0.9884 |  0.7892   |      0.32      |   0.7938    |  0.7928  |         0.8234         |
|            repvgg_a2            | 128 | 0.9867 |  0.8054   |     0.3277     |   0.6573    |  0.7684  |         0.8011         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_float32.png :

bench_logs/huggingface_float32.png :

bench_logs/torchbench_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 94%, 50/53 | 98%, 42/43  | 100%, 61/61 |
|       aot_eager        | 94%, 50/53 | 98%, 42/43  | 90%, 55/61  |
|     aot_cudagraphs     | 74%, 39/53 | 53%, 23/43  | 75%, 46/61  |
|      aot_nvfuser       | 60%, 32/53 |  0%, 0/43   | 75%, 46/61  |
|        inductor        | 85%, 45/53 | 93%, 40/43  | 93%, 57/61  |
| inductor_no_cudagraphs | 87%, 46/53 | 93%, 40/43  | 93%, 57/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.01x    |    1.01x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|     aot_cudagraphs     |   1.19x    |    1.29x    |    1.06x    |
|      aot_nvfuser       |   1.16x    |    0.0x     |    1.20x    |
|        inductor        |   1.84x    |    2.30x    |    1.56x    |
| inductor_no_cudagraphs |   1.37x    |    1.64x    |    1.36x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    1.86    |    2.54     |    2.09     |
|       aot_eager        |    7.63    |    12.68    |    10.35    |
|     aot_cudagraphs     |    7.76    |    16.14    |    19.75    |
|      aot_nvfuser       |   26.75    |     0.0     |    69.96    |
|        inductor        |   56.04    |    56.96    |    94.80    |
| inductor_no_cudagraphs |   28.04    |    29.28    |    32.82    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.96x    |    0.98x    |    0.99x    |
|       aot_eager        |   0.85x    |    0.87x    |    0.87x    |
|     aot_cudagraphs     |   0.42x    |    0.40x    |    0.33x    |
|      aot_nvfuser       |   0.83x    |    0.0x     |    0.85x    |
|        inductor        |   0.83x    |    0.86x    |    0.94x    |
| inductor_no_cudagraphs |   1.00x    |    1.05x    |    1.03x    |
+------------------------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|           BERT_pytorch            |  16  | 1.0084 |  0.8302   |      0.0       |     0.0     |  3.2325  |         2.4027         |
|             hf_Albert             |  8   | 1.0011 |  0.9532   |     0.7722     |     0.0     |  2.3807  |         2.3221         |
|            hf_T5_large            |  2   | 1.0191 |  0.8479   |      0.0       |     0.0     |  2.478   |         2.1536         |
|              hf_GPT2              |  4   | 1.0209 |  0.9839   |     0.8075     |     0.0     |  1.8627  |         1.847          |
|               hf_T5               |  8   | 0.9992 |   0.945   |      0.0       |     0.0     |  1.8351  |         1.8404         |
|              hf_Bert              |  4   | 1.0396 |  0.8545   |     0.928      |     0.0     |  2.0098  |         1.7521         |
|           hf_GPT2_large           |  4   | 1.0002 |  0.9903   |      0.0       |     0.0     |   0.0    |         1.7504         |
|              hf_Bart              |  4   | 1.009  |  0.8248   |      0.0       |     0.0     |  1.8018  |         1.7338         |
|        speech_transformer         |  32  | 1.002  |  0.8354   |      0.0       |     0.0     |  1.7114  |         1.6954         |
|           timm_resnest            |  32  | 1.0045 |  1.0195   |     0.8318     |   1.3124    |  1.9303  |         1.6815         |
|      timm_vision_transformer      |  8   |  1.01  |  0.8464   |     1.7343     |   1.3458    |  3.2299  |         1.562          |
|         timm_efficientdet         |  1   | 0.9799 |  0.8051   |      0.0       |     0.0     |  4.7029  |         1.5402         |
|           mobilenet_v2            |  96  | 0.9994 |  0.9885   |     0.7643     |   0.9252    |  1.5653  |         1.5189         |
| attention_is_all_you_need_pytorch | 256  | 1.0066 |  0.9055   |      0.0       |     0.0     |  1.5082  |         1.4711         |
|           hf_DistilBert           |  8   | 1.0024 |   0.973   |     0.7311     |     0.0     |  1.4674  |         1.4463         |
|        shufflenet_v2_x1_0         | 128  | 1.0002 |  1.0086   |     0.9708     |   1.3407    |  1.6819  |         1.4435         |
|        mobilenet_v3_large         |  32  | 1.0035 |  1.0114   |     1.6301     |    1.418    |  3.029   |         1.4347         |
|            timm_nfnet             | 128  | 0.9991 |  1.0006   |      0.0       |   1.1738    |  1.501   |         1.4297         |
|           fastNLP_Bert            |  6   | 0.9989 |  0.8938   |     0.7673     |     0.0     |  1.4755  |         1.4176         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9983 |  0.9198   |     1.4546     |   1.2123    |  2.1261  |         1.3981         |
|       functorch_dp_cifar10        |  64  | 1.0013 |  0.9154   |     2.3988     |   1.1866    |  4.9158  |         1.3896         |
|            mnasnet1_0             |  32  | 0.9979 |  1.0035   |     1.2502     |   1.3985    |  2.6088  |         1.3719         |
|             resnet50              |  32  | 1.0026 |   1.006   |     1.0265     |   1.3597    |  1.7778  |         1.3676         |
|            densenet121            |  4   | 1.0025 |  0.9068   |     2.5239     |   1.3635    |  6.634   |         1.3397         |
|           pytorch_unet            |  1   | 0.9995 |  0.9927   |     0.863      |   1.1545    |  1.3442  |         1.3154         |
|          LearningToPaint          |  96  | 1.0019 |  0.9981   |     1.1493     |    1.359    |  1.8583  |         1.309          |
|           squeezenet1_1           |  32  | 1.003  |  0.9426   |     1.4482     |   1.1759    |  2.4444  |          1.29          |
|         timm_efficientnet         |  32  | 0.9584 |  0.8099   |     1.0738     |   1.1802    |  2.1108  |         1.2825         |
|          resnext50_32x4d          |  8   | 1.0019 |  0.9491   |     1.8428     |   1.3362    |  3.5086  |         1.269          |
|               vgg16               |  64  | 0.9996 |  0.9981   |     0.8582     |   0.9949    |  1.2709  |         1.2657         |
|          pytorch_stargan          |  16  | 0.9944 |  1.0237   |     0.9666     |   1.0868    |  1.3488  |         1.2632         |
|          pytorch_struct           | 200  | 0.9891 |  0.7371   |     1.0074     |    1.051    |  2.1098  |         1.2601         |
|            Super_SloMo            |  6   | 1.0001 |  0.9954   |     0.887      |     0.0     |  1.2879  |         1.2577         |
|             resnet18              |  16  | 1.0014 |  0.9898   |     1.5783     |   1.3409    |  2.954   |         1.2486         |
|            timm_regnet            |  32  | 0.9833 |  0.9338   |     0.8864     |   1.1838    |  1.2927  |         1.2336         |
|              alexnet              | 128  | 0.9987 |  0.9981   |     0.8156     |   1.0033    |  1.2128  |         1.2096         |
|        Background_Matting         |  4   | 0.9995 |   1.016   |     0.895      |   1.1126    |  1.2216  |         1.2055         |
|                drq                |  1   | 1.0052 |  0.8044   |     1.6843     |   1.1378    |  3.0048  |         1.1715         |
|            hf_Reformer            |  4   | 0.9954 |   0.999   |     0.9446     |     0.0     |  1.1587  |         1.1538         |
|            timm_vovnet            |  32  | 0.9189 |  0.8867   |     0.8556     |   1.1162    |  1.2846  |         1.1406         |
|   timm_vision_transformer_large   |  8   |  1.0   |  0.9905   |      0.0       |   0.9928    |  1.1564  |         1.1334         |
|              yolov3               |  16  | 1.0004 |  0.9903   |     0.8038     |   0.9306    |  1.1044  |         1.0799         |
|           lennard_jones           | 1000 | 0.9714 |  0.7434   |     1.2671     |   1.0511    |  2.088   |         1.0736         |
|               dcgan               |  32  | 0.9851 |  0.9098   |     1.6229     |    0.727    |  2.5611  |         1.0728         |
|         soft_actor_critic         | 256  | 0.9908 |  0.7429   |     1.3292     |   1.0647    |  1.7305  |         1.0396         |
|            hf_BigBird             |  2   | 0.9936 |  0.9157   |     1.0555     |     0.0     |  1.1515  |         1.0361         |
|      nvidia_deeprecommender       | 256  | 0.9988 |  0.9962   |     0.6971     |    0.979    |  0.9895  |         1.0298         |
|            tts_angular            |  64  | 0.9591 |  0.9356   |     0.9872     |   0.9955    |  1.0077  |         1.0252         |
|              demucs               |  4   | 1.0001 |  1.0008   |     1.0017     |   1.0039    |  1.0004  |         0.9973         |
|               dlrm                | 2048 | 1.1136 |  1.0706   |      0.0       |     0.0     |   0.0    |         0.9776         |
|           hf_Longformer           |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        Background_Matting         |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |          pass          |
|            Super_SloMo            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           fastNLP_Bert            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|             hf_Albert             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bart              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bert              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_BigBird             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           hf_DistilBert           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              yolov3               |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|           hf_Longformer           |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|             tacotron2             |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |  fail_accuracy   |     fail_accuracy      |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|              yolov3               |  16  |  2.923  |  9.7089   |     13.516     |   39.5674   | 442.3717 |        402.8973        |
|            hf_T5_large            |  2   | 12.9718 |  48.602   |      nan       |     nan     | 228.4119 |        115.6282        |
|         timm_efficientdet         |  1   | 19.6178 |  42.7358  |      nan       |     nan     | 465.5501 |        92.4418         |
|           hf_GPT2_large           |  4   | 5.3147  |  24.5289  |      nan       |     nan     |   nan    |        69.9105         |
|   timm_vision_transformer_large   |  8   | 2.7963  |  19.039   |      nan       |   38.1465   | 127.5698 |        52.2793         |
|            densenet121            |  4   | 2.1156  |  15.3847  |    22.9485     |  124.2562   | 51.3158  |        49.6844         |
|            timm_nfnet             | 128  | 1.9394  |  8.3928   |      nan       |   37.4136   | 31.9275  |        30.3265         |
|        speech_transformer         |  32  | 1.8686  |  11.1471  |      nan       |     nan     | 153.5341 |        29.0602         |
|            hf_BigBird             |  2   |  7.995  |  16.5311  |    37.0042     |     nan     | 48.5517  |        28.6978         |
|           BERT_pytorch            |  16  |  1.681  |  9.8188   |      nan       |     nan     | 104.329  |        28.6499         |
|              hf_Bart              |  4   | 1.7165  |  11.0239  |      nan       |     nan     | 56.2143  |        26.5941         |
|               hf_T5               |  8   | 2.1584  |  10.6262  |      nan       |     nan     |  52.43   |        25.8487         |
|           fastNLP_Bert            |  6   | 1.7279  |  8.9622   |    13.5539     |     nan     | 72.7087  |        24.9241         |
|            timm_regnet            |  32  | 2.2887  |  10.0729  |    24.6471     |   59.6018   | 26.1219  |        24.8306         |
| attention_is_all_you_need_pytorch | 256  | 1.2898  |   9.299   |      nan       |     nan     | 151.7044 |        23.7492         |
|         timm_efficientnet         |  32  | 1.7559  |  7.8059   |    18.1243     |   68.4276   | 24.1677  |        22.8523         |
|              hf_Bert              |  4   |  1.575  |  8.7208   |    12.1287     |     nan     | 34.2626  |        22.2991         |
|              hf_GPT2              |  4   | 1.4288  |  7.8781   |    11.3481     |     nan     | 67.9973  |         20.003         |
|        shufflenet_v2_x1_0         | 128  | 0.9449  |  6.2344   |     8.8631     |   37.1219   |  20.981  |        19.7286         |
|        mobilenet_v3_large         |  32  | 0.8985  |  6.0447   |     8.2574     |   71.7018   |  33.405  |        19.7087         |
|        Background_Matting         |  4   | 0.8973  |  5.7766   |     7.9597     |   42.6476   | 19.9451  |         18.947         |
|            Super_SloMo            |  6   | 1.0495  |  5.9627   |     7.8978     |     nan     | 19.6864  |        18.8643         |
|           mobilenet_v2            |  96  | 0.7945  |  5.3465   |     7.8946     |   40.0719   | 19.8713  |         18.229         |
|             hf_Albert             |  8   | 1.3084  |  8.1272   |    11.9952     |     nan     | 48.3702  |        17.2142         |
|            mnasnet1_0             |  32  | 0.8019  |  5.3211   |     7.4705     |   42.8673   | 31.4894  |        17.2027         |
|            timm_vovnet            |  32  | 1.4771  |  5.3249   |    11.6258     |   30.1357   | 18.0646  |        17.0872         |
|             resnet50              |  32  | 0.8485  |  5.7615   |     8.0429     |   40.4635   | 17.6785  |        16.6779         |
|          resnext50_32x4d          |  8   | 0.8793  |  5.7376   |     7.9595     |   35.6892   | 30.5601  |        16.3652         |
|      timm_vision_transformer      |  8   | 0.9322  |   5.829   |     7.7951     |   13.8935   | 132.5076 |        16.2786         |
|            hf_Reformer            |  4   | 2.4679  |  5.2911   |     9.7317     |     nan     | 36.0017  |        13.9363         |
|           timm_resnest            |  32  | 0.5771  |  3.2137   |     4.4248     |   42.3999   | 122.2678 |        11.9753         |
|           hf_DistilBert           |  8   | 0.5984  |  4.0834   |      8.17      |     nan     | 20.8504  |        11.6312         |
|       functorch_dp_cifar10        |  64  |  0.373  |  2.3811   |     3.266      |   6.1885    |  27.488  |         9.6611         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.4183  |  2.7507   |     3.5918     |   4.6509    |  9.2553  |         8.877          |
|           pytorch_unet            |  1   | 0.4465  |  2.5453   |     3.5729     |   25.8944   |  9.434   |         8.7277         |
|             resnet18              |  16  | 0.4111  |  2.2097   |      3.3       |   22.8777   | 22.4984  |         7.833          |
|          LearningToPaint          |  96  | 0.4329  |  2.2982   |     3.2991     |   30.2571   |  8.2768  |         7.6915         |
|          pytorch_stargan          |  16  | 0.3986  |   2.634   |     3.5008     |   6.7975    | 102.4642 |         6.8949         |
|           squeezenet1_1           |  32  | 0.2615  |  1.3857   |     1.9036     |   6.4523    |  4.9767  |         4.7825         |
|               vgg16               |  64  | 0.2087  |  0.9465   |     1.4011     |   3.4915    |  4.3809  |         3.6518         |
|                drq                |  1   | 0.1568  |  0.6445   |     1.0458     |   4.3191    |  4.1117  |         3.4548         |
|          pytorch_struct           | 200  | 0.2711  |  1.1176   |     1.7622     |   5.3086    |  99.742  |         3.3771         |
|               dlrm                | 2048 |  0.471  |   1.018   |      nan       |     nan     |   nan    |         3.1767         |
|              alexnet              | 128  | 0.1657  |  0.6086   |     0.8834     |   3.1333    |  3.4741  |         2.9053         |
|         soft_actor_critic         | 256  | 0.2101  |  0.4298   |     0.6681     |   1.9993    |  3.4273  |         2.7294         |
|               dcgan               |  32  |  0.17   |  0.5153   |     0.7625     |   4.1727    |   3.03   |          2.5           |
|      nvidia_deeprecommender       | 256  | 0.2071  |  0.6086   |     0.8996     |   2.8956    |  4.6821  |         2.2187         |
|           lennard_jones           | 1000 | 0.1599  |  0.4314   |     0.6112     |   1.4535    |  2.1934  |         1.9312         |
|            tts_angular            |  64  | 0.2324  |  0.2957   |     0.4182     |   1.0314    |  1.9835  |         1.4594         |
|              demucs               |  4   | 0.3603  |  0.3485   |     0.3495     |   0.3509    |  0.2726  |         0.2585         |
|           hf_Longformer           |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|             hf_Albert             |  8   | 0.9814 |   0.936   |     0.3273     |     nan     |  1.1576  |         1.5688         |
|         timm_efficientdet         |  1   | 1.028  |  0.8404   |      nan       |     nan     |  1.0226  |         1.4663         |
|         timm_efficientnet         |  32  | 0.988  |  0.7698   |     0.2716     |   0.7887    |  1.2758  |         1.3156         |
|               hf_T5               |  8   | 0.9678 |  0.9371   |      nan       |     nan     |  0.9309  |         1.2564         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8822   |      nan       |     nan     |  0.9728  |         1.2033         |
| attention_is_all_you_need_pytorch | 256  | 0.9979 |   0.94    |      nan       |     nan     |  0.9829  |         1.1666         |
|              hf_GPT2              |  4   | 0.9706 |  0.8625   |     0.3688     |     nan     |  0.9648  |         1.153          |
|            hf_BigBird             |  2   | 0.9837 |  0.9784   |     0.4541     |     nan     |  0.8098  |         1.1522         |
|           fastNLP_Bert            |  6   | 1.0012 |  0.8966   |     0.3702     |     nan     |  0.8661  |         1.1505         |
|            Super_SloMo            |  6   | 1.0024 |  0.9645   |     0.3843     |     nan     |  1.0536  |         1.1475         |
|           hf_GPT2_large           |  4   | 0.9582 |  0.8645   |      nan       |     nan     |   nan    |         1.1364         |
|            timm_nfnet             | 128  | 0.9693 |  0.8982   |      nan       |   0.9445    |  1.0337  |         1.1245         |
|        speech_transformer         |  32  | 1.0048 |  0.9174   |      nan       |     nan     |  0.9066  |         1.118          |
|      timm_vision_transformer      |  8   | 0.9952 |  0.8826   |     0.3921     |   0.8871    |  0.7151  |         1.0538         |
|           timm_resnest            |  32  | 0.9868 |  0.8711   |     0.3483     |   0.8623    |  0.8756  |         1.053          |
|         soft_actor_critic         | 256  | 0.9998 |  0.9149   |     0.4736     |   0.9149    |  0.7295  |         1.0367         |
|       functorch_dp_cifar10        |  64  | 0.9964 |  0.8107   |     0.4465     |   0.8452    |  0.4478  |         1.0327         |
|   timm_vision_transformer_large   |  8   | 0.9973 |  0.8358   |      nan       |   0.8494    |  0.879   |         1.0239         |
|           mobilenet_v2            |  96  | 0.9857 |  0.7639   |     0.3119     |   0.9117    |  1.0074  |         1.0232         |
|              hf_Bert              |  4   | 0.9844 |  0.8677   |     0.3806     |     nan     |  0.9017  |         1.0046         |
|            tts_angular            |  64  | 1.0002 |  1.0002   |     0.9853     |   1.0002    |  0.9895  |         1.0002         |
|           lennard_jones           | 1000 | 0.9995 |  0.9997   |     0.3734     |   1.0967    |  0.564   |         0.9991         |
|            hf_Reformer            |  4   | 0.3764 |  0.9847   |     0.3481     |     nan     |  0.3629  |         0.9874         |
|              demucs               |  4   | 0.9872 |  0.9872   |     0.9872     |   0.9872    |  0.9872  |         0.9872         |
|              hf_Bart              |  4   | 0.9102 |  0.8321   |      nan       |     nan     |  0.8137  |         0.9871         |
|          pytorch_stargan          |  16  | 0.9929 |  0.9742   |     0.4253     |   0.8882    |  0.7783  |         0.9807         |
|          pytorch_struct           | 200  |  1.0   |  0.5081   |     0.4858     |   0.5082    |  0.4235  |         0.9726         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.8735   |     0.4234     |   0.8441    |  0.8863  |         0.9721         |
|           hf_DistilBert           |  8   | 0.9505 |  0.8806   |     0.3229     |     nan     |  0.8387  |         0.9717         |
|            densenet121            |  4   | 0.9857 |  0.8678   |     0.3667     |   0.8376    |  0.8753  |         0.9535         |
|          resnext50_32x4d          |  8   | 0.9932 |  0.8549   |     0.3882     |   0.8176    |  0.7644  |         0.9518         |
|        mobilenet_v3_large         |  32  | 0.9776 |  0.8499   |     0.3446     |    0.866    |  0.7918  |         0.9343         |
|            timm_regnet            |  32  | 0.9953 |  0.8446   |     0.3492     |    0.85     |  0.9249  |         0.9292         |
|        Background_Matting         |  4   | 1.0146 |  0.9624   |     0.3723     |   0.9813    |  0.9245  |         0.9292         |
|                drq                |  1   | 0.9877 |  0.8312   |     0.4769     |   0.8308    |  0.752   |         0.9256         |
|              yolov3               |  16  | 0.9908 |  0.8381   |     0.3536     |   0.8244    |  0.9059  |         0.9109         |
|              alexnet              | 128  | 0.951  |  0.7753   |     0.4793     |   0.7753    |  0.7974  |         0.9099         |
|             resnet50              |  32  | 0.9907 |  0.8629   |     0.3563     |   0.7995    |  0.865   |         0.9026         |
|           squeezenet1_1           |  32  | 0.9604 |  0.7958   |     0.3459     |   0.7589    |  0.8611  |         0.8951         |
|        shufflenet_v2_x1_0         | 128  | 0.956  |  0.8401   |     0.3573     |   0.8503    |  0.856   |         0.8927         |
|            mnasnet1_0             |  32  | 0.9785 |  0.8621   |     0.341      |   0.8207    |  0.7541  |         0.8749         |
|               dcgan               |  32  | 0.9698 |  0.7838   |     0.4994     |   0.7073    |  0.8283  |         0.8738         |
|           pytorch_unet            |  1   | 0.9968 |  0.8653   |     0.3572     |   0.8496    |  0.8678  |         0.8715         |
|             resnet18              |  16  | 0.9779 |  0.7727   |     0.3941     |   0.7276    |  0.6102  |         0.8568         |
|            hf_T5_large            |  2   | 0.8541 |  0.8541   |      nan       |     nan     |  0.8541  |         0.8541         |
|            timm_vovnet            |  32  | 0.9903 |  0.7678   |     0.3407     |   0.7742    |  0.8352  |         0.8469         |
|          LearningToPaint          |  96  | 0.9252 |  0.7196   |     0.3827     |   0.6722    |  0.7295  |         0.8017         |
|               vgg16               |  64  | 0.9924 |  0.7339   |     0.3776     |   0.7172    |  0.7491  |         0.7534         |
|               dlrm                | 2048 | 0.7302 |  0.7306   |      nan       |     nan     |   nan    |         0.7306         |
|      nvidia_deeprecommender       | 256  | 0.5596 |  0.5596   |     0.5125     |   0.5596    |  0.5596  |         0.5596         |
|           hf_Longformer           |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|       MT5ForConditionalGeneration       | 2  | 1.0212 |  0.8537   |      0.0       |     0.0     |  5.8401  |         2.4169         |
|      GPT2ForSequenceClassification      | 4  | 1.0005 |   0.977   |      0.0       |     0.0     |  2.1528  |         2.1163         |
|               DistillGPT2               | 1  | 1.0292 |  0.8786   |     1.2513     |     0.0     |  2.8593  |         2.0686         |
|             OPTForCausalLM              | 4  | 1.0041 |  0.8281   |     1.801      |     0.0     |  4.0683  |         1.9383         |
|       ElectraForQuestionAnswering       | 64 | 1.0004 |   0.98    |     0.7683     |     0.0     |  1.9566  |         1.9104         |
|     PLBartForConditionalGeneration      | 8  | 1.0152 |   0.831   |     1.485      |     0.0     |  2.9117  |         1.8791         |
|    MegatronBertForQuestionAnswering     | 8  | 1.0325 |  0.8479   |     1.4804     |     0.0     |  2.7143  |         1.8681         |
|      MBartForConditionalGeneration      | 8  | 1.0133 |   0.833   |      1.29      |     0.0     |  2.425   |         1.8204         |
|            TrOCRForCausalLM             | 8  | 1.0122 |  0.8299   |      0.0       |     0.0     |  2.0315  |         1.7879         |
|         Speech2Text2ForCausalLM         | 64 | 1.0089 |  0.8016   |     0.9604     |     0.0     |  2.2801  |         1.7876         |
|                CamemBert                | 1  | 1.0359 |  0.8456   |     1.7367     |     0.0     |  3.5883  |         1.7784         |
|      BartForConditionalGeneration       | 1  | 1.0127 |  0.8451   |      0.0       |     0.0     |  1.835   |         1.7714         |
|             XGLMForCausalLM             | 1  | 1.0128 |  0.8061   |      0.0       |     0.0     |  3.2516  |         1.7645         |
|           RobertaForCausalLM            | 4  | 1.037  |  0.8408   |     1.9854     |     0.0     |  4.0998  |         1.7565         |
|         MegatronBertForCausalLM         | 2  | 1.0316 |   0.851   |     2.1788     |     0.0     |  4.2892  |         1.7409         |
|     MobileBertForQuestionAnswering      | 32 | 1.0206 |  0.8216   |      0.0       |     0.0     |  5.4787  |         1.7407         |
|          DistilBertForMaskedLM          | 16 | 1.0314 |  0.8586   |     1.035      |     0.0     |  2.3368  |         1.7378         |
|          MobileBertForMaskedLM          | 16 | 1.017  |  0.8202   |      0.0       |     0.0     |  5.9124  |         1.7318         |
|     DistilBertForQuestionAnswering      | 32 | 1.0318 |  0.8387   |     0.8926     |     0.0     |  1.8261  |         1.7123         |
|           ElectraForCausalLM            | 1  | 1.0333 |  0.8518   |     2.6494     |     0.0     |  6.9167  |         1.7048         |
|    LayoutLMForSequenceClassification    | 16 | 1.0003 |  0.9807   |     0.7766     |     0.0     |  1.7372  |         1.6987         |
|     PegasusForConditionalGeneration     | 4  | 1.0149 |  0.8276   |     1.6544     |     0.0     |  3.3423  |         1.6961         |
| BlenderbotSmallForConditionalGeneration | 32 | 1.0129 |  0.8855   |      0.0       |     0.0     |  1.8344  |         1.6928         |
|           PegasusForCausalLM            | 8  | 1.0125 |  0.8143   |     1.0511     |     0.0     |  1.9151  |         1.6918         |
|            YituTechConvBert             | 1  | 1.0227 |  0.8902   |      0.0       |     0.0     |  5.5641  |         1.6775         |
|       AlbertForQuestionAnswering        | 2  | 1.0007 |  0.8084   |      0.0       |     0.0     |  1.6676  |         1.6491         |
|            AlbertForMaskedLM            | 2  | 1.0004 |  0.8083   |      0.0       |     0.0     |  1.6609  |         1.6461         |
|            XLNetLMHeadModel             | 4  | 0.9982 |  0.9679   |      0.0       |     0.0     |  1.6195  |         1.6285         |
|     M2M100ForConditionalGeneration      | 2  | 1.0099 |  0.8836   |     2.0714     |     0.0     |  3.9054  |         1.615          |
|       T5ForConditionalGeneration        | 4  | 1.0029 |  0.9399   |      0.0       |     0.0     |  1.6013  |         1.5996         |
|           LayoutLMForMaskedLM           | 16 | 1.0004 |  0.9711   |     0.7533     |     0.0     |  1.5855  |         1.5656         |
|            PLBartForCausalLM            | 16 | 1.0135 |  0.9475   |     0.927      |     0.0     |  1.5381  |         1.5062         |
|             BartForCausalLM             | 2  | 1.0018 |  0.9647   |     0.7407     |     0.0     |  1.499   |         1.4516         |
|                 T5Small                 | 1  | 1.0245 |  0.8784   |      0.0       |     0.0     |  1.7858  |         1.4468         |
|            MBartForCausalLM             | 16 | 1.0152 |  0.9072   |      0.0       |     0.0     |  1.4077  |         1.4182         |
|        BertForQuestionAnswering         | 64 | 1.0015 |  0.9675   |     0.7679     |     0.0     |  1.4346  |         1.3999         |
|       RobertaForQuestionAnswering       | 64 |  1.0   |  0.9709   |      0.77      |     0.0     |  1.4356  |         1.3965         |
|             BertForMaskedLM             | 64 | 0.9999 |  0.9578   |     0.7351     |     0.0     |  1.3295  |         1.317          |
|       BlenderbotSmallForCausalLM        | 64 | 1.0016 |  0.9217   |     0.7015     |     0.0     |  1.3074  |         1.312          |
|           DebertaForMaskedLM            | 4  | 0.9364 |   0.737   |     0.8416     |     0.0     |  1.245   |         1.2378         |
|       DebertaForQuestionAnswering       | 4  | 0.9309 |  0.7348   |     0.9413     |     0.0     |  1.4976  |         1.2299         |
|                 BigBird                 | 1  | 0.9913 |  0.9116   |     1.0399     |     0.0     |  1.1587  |         1.0251         |
|          AllenaiLongformerBase          | 0  |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+
|                  name                   | bs |    eager    |  aot_eager  | aot_cudagraphs | aot_nvfuser |  inductor   | inductor_no_cudagraphs |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+
|            AlbertForMaskedLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       AlbertForQuestionAnswering        | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             BertForMaskedLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|        BertForQuestionAnswering         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|                 BigBird                 | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|                CamemBert                | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|          DistilBertForMaskedLM          | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     DistilBertForQuestionAnswering      | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|               DistillGPT2               | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           ElectraForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       ElectraForQuestionAnswering       | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           LayoutLMForMaskedLM           | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|    LayoutLMForSequenceClassification    | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     M2M100ForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|         MegatronBertForCausalLM         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             OPTForCausalLM              | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            PLBartForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           PegasusForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     PegasusForConditionalGeneration     | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           RobertaForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       RobertaForQuestionAnswering       | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|         Speech2Text2ForCausalLM         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            TrOCRForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             BartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      BartForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|           DebertaForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      GPT2ForSequenceClassification      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            MBartForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       MT5ForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|          MobileBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|     MobileBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       T5ForConditionalGeneration        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|                 T5Small                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             XGLMForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            XLNetLMHeadModel             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            YituTechConvBert             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       DebertaForQuestionAnswering       | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |          pass          |
|      MBartForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run | fail_to_run |      fail_to_run       |
|     PLBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
|          AllenaiLongformerBase          | 1  | fail_to_run | fail_to_run |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|          MobileBertForMaskedLM          | 16 | 9.0195 |  42.6022  |      nan       |     nan     | 121.1753 |        76.0038         |
|     MobileBertForQuestionAnswering      | 32 | 8.9885 |  39.986   |      nan       |     nan     | 103.8671 |        74.0658         |
|            XLNetLMHeadModel             | 4  | 3.7491 |  24.4324  |      nan       |     nan     | 173.4133 |        59.9689         |
|      MBartForConditionalGeneration      | 8  | 3.4406 |  21.6843  |    33.6119     |     nan     | 69.3267  |        51.0072         |
|     PegasusForConditionalGeneration     | 4  | 3.2658 |  21.0318  |    33.0349     |     nan     | 84.6794  |        50.1654         |
|     M2M100ForConditionalGeneration      | 2  | 3.4775 |  20.0169  |    27.9012     |     nan     | 88.9362  |        50.0348         |
|         MegatronBertForCausalLM         | 2  | 3.4485 |  18.0977  |    26.8718     |     nan     | 74.6988  |        47.3342         |
|      BartForConditionalGeneration       | 1  | 3.364  |  21.4216  |      nan       |     nan     | 57.7421  |        46.8986         |
|    MegatronBertForQuestionAnswering     | 8  | 3.5021 |  17.8618  |    26.1531     |     nan     | 67.3498  |        44.8143         |
|             XGLMForCausalLM             | 1  | 2.7292 |  16.5557  |      nan       |     nan     | 69.2599  |        41.0403         |
|       MT5ForConditionalGeneration       | 2  | 3.3471 |  16.2423  |      nan       |     nan     | 104.9205 |        37.5933         |
|           DebertaForMaskedLM            | 4  | 4.8862 |  12.3834  |    54.0419     |     nan     | 130.5489 |        35.1018         |
|       DebertaForQuestionAnswering       | 4  | 4.9971 |  12.3246  |    47.0074     |     nan     | 99.3878  |        34.2262         |
| BlenderbotSmallForConditionalGeneration | 32 | 2.1737 |  14.0085  |      nan       |     nan     | 57.6438  |        33.8027         |
|            YituTechConvBert             | 1  | 2.4988 |  13.456   |      nan       |     nan     | 119.0895 |        31.7395         |
|                 BigBird                 | 1  | 7.9649 |  16.4584  |    37.3619     |     nan     | 48.9328  |        29.3896         |
|     PLBartForConditionalGeneration      | 8  | 1.7594 |  11.1994  |    15.7536     |     nan     | 63.1955  |        26.6755         |
|       T5ForConditionalGeneration        | 4  | 2.0978 |  10.6908  |      nan       |     nan     |  63.019  |        25.8627         |
|                 T5Small                 | 1  | 2.1081 |  10.7492  |      nan       |     nan     | 60.4685  |        25.1827         |
|    LayoutLMForSequenceClassification    | 16 | 1.7653 |  9.3441   |    12.9331     |     nan     | 47.9953  |        23.7189         |
|           LayoutLMForMaskedLM           | 16 | 1.8121 |  9.0929   |    13.1247     |     nan     | 37.3037  |        22.7907         |
|           RobertaForCausalLM            | 4  | 1.6561 |  8.9255   |    12.2233     |     nan     | 62.9074  |         22.65          |
|       ElectraForQuestionAnswering       | 64 | 1.6063 |  8.8847   |    12.3916     |     nan     |  37.068  |        22.3866         |
|             BertForMaskedLM             | 64 | 1.5424 |  8.6723   |    12.2396     |     nan     | 39.9349  |        22.0051         |
|           ElectraForCausalLM            | 1  | 1.6779 |  8.8162   |    11.9966     |     nan     | 25.5091  |        21.8899         |
|                CamemBert                | 1  | 1.6319 |  8.7693   |    12.0634     |     nan     | 25.0396  |        21.4368         |
|        BertForQuestionAnswering         | 64 | 1.6921 |  8.6988   |    12.1721     |     nan     | 23.0439  |        21.2348         |
|           PegasusForCausalLM            | 8  | 1.2966 |  7.9649   |    11.9687     |     nan     | 44.3083  |        21.1057         |
|            MBartForCausalLM             | 16 | 1.2299 |  7.8781   |      nan       |     nan     | 33.5959  |        20.9377         |
|       RobertaForQuestionAnswering       | 64 | 1.6109 |  8.9447   |    12.3986     |     nan     | 23.1424  |        20.6543         |
|      GPT2ForSequenceClassification      | 4  | 1.4262 |  8.0483   |      nan       |     nan     | 34.5897  |         20.232         |
|            AlbertForMaskedLM            | 2  | 1.4638 |  8.4092   |      nan       |     nan     | 34.9008  |        19.8435         |
|             BartForCausalLM             | 2  | 1.2531 |  7.9796   |    11.8826     |     nan     | 32.8441  |        19.6195         |
|            TrOCRForCausalLM             | 8  | 1.2602 |  8.0204   |      nan       |     nan     | 25.7557  |        19.4415         |
|             OPTForCausalLM              | 4  | 1.3439 |  8.1321   |     20.022     |     nan     | 39.4153  |        19.0927         |
|       AlbertForQuestionAnswering        | 2  | 1.4795 |  8.3655   |      nan       |     nan     | 18.0915  |        17.1831         |
|       BlenderbotSmallForCausalLM        | 64 | 0.7727 |  5.7176   |     7.7626     |     nan     | 27.7887  |        14.1211         |
|          DistilBertForMaskedLM          | 16 | 0.6189 |   4.172   |     8.2951     |     nan     | 22.6199  |        12.0578         |
|     DistilBertForQuestionAnswering      | 32 | 0.6611 |  4.2243   |     8.2315     |     nan     | 34.9369  |        11.7694         |
|         Speech2Text2ForCausalLM         | 64 | 0.6848 |  4.1527   |     6.7412     |     nan     | 27.6725  |        11.7505         |
|            PLBartForCausalLM            | 16 | 0.6324 |  4.1172   |     6.1244     |     nan     | 26.6818  |        11.4105         |
|               DistillGPT2               | 1  | 0.7388 |  3.9171   |     5.6422     |     nan     | 28.1078  |        10.5911         |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+
|       AlbertForQuestionAnswering        | 2  |  1.0   |  0.6451   |      nan       |     nan     |  0.9124  |          1.44          |
|            AlbertForMaskedLM            | 2  |  1.0   |  0.6364   |      nan       |     nan     |  0.8977  |         1.4235         |
|       T5ForConditionalGeneration        | 4  | 0.9996 |  0.9594   |      nan       |     nan     |  0.995   |         1.2292         |
|                 T5Small                 | 1  |  1.0   |  0.9124   |      nan       |     nan     |  0.9874  |         1.1703         |
|      GPT2ForSequenceClassification      | 4  | 0.9675 |  0.9164   |      nan       |     nan     |  1.0779  |         1.1635         |
|       DebertaForQuestionAnswering       | 4  | 0.9792 |  1.0574   |     0.3598     |     nan     |  0.3761  |         1.1472         |
|      BartForConditionalGeneration       | 1  |  1.0   |  0.8619   |      nan       |     nan     |  0.9894  |         1.1415         |
|             BartForCausalLM             | 2  |  1.0   |  0.8769   |     0.3797     |     nan     |  1.0442  |         1.1204         |
|                 BigBird                 | 1  | 1.0008 |  0.9547   |     0.4481     |     nan     |  0.835   |         1.1192         |
|           DebertaForMaskedLM            | 4  | 0.9982 |  0.9824   |     0.3623     |     nan     |  0.4498  |         1.1125         |
| BlenderbotSmallForConditionalGeneration | 32 | 0.9998 |  0.8996   |      nan       |     nan     |  0.9557  |         1.1008         |
|         Speech2Text2ForCausalLM         | 64 | 0.969  |  0.8488   |     0.3578     |     nan     |  0.9452  |         1.075          |
|       ElectraForQuestionAnswering       | 64 | 1.0016 |  0.9538   |     0.3384     |     nan     |  0.9938  |         1.0704         |
|          MobileBertForMaskedLM          | 16 | 0.9985 |  0.8983   |      nan       |     nan     |  0.6948  |         1.0683         |
|      MBartForConditionalGeneration      | 8  | 0.9999 |  0.8187   |     0.4121     |     nan     |  0.8861  |         1.0626         |
|       RobertaForQuestionAnswering       | 64 | 0.9996 |  0.9315   |     0.3686     |     nan     |  0.9946  |         1.0621         |
|        BertForQuestionAnswering         | 64 | 0.9995 |  0.9315   |     0.3686     |     nan     |  0.9946  |         1.0621         |
|    LayoutLMForSequenceClassification    | 16 | 1.004  |  0.9325   |     0.3632     |     nan     |  1.0056  |         1.0614         |
|     DistilBertForQuestionAnswering      | 32 | 0.9992 |  0.8965   |     0.376      |     nan     |  0.8639  |         1.0584         |
|               DistillGPT2               | 1  | 0.9963 |  0.7527   |     0.3884     |     nan     |  0.8288  |         1.0545         |
|            MBartForCausalLM             | 16 |  1.0   |  0.8398   |      nan       |     nan     |  0.9567  |         1.0451         |
|       BlenderbotSmallForCausalLM        | 64 | 0.9996 |  0.8172   |     0.3597     |     nan     |  0.9269  |         1.0441         |
|           PegasusForCausalLM            | 8  | 0.999  |  0.9444   |     0.4647     |     nan     |  0.8445  |         1.0404         |
|             BertForMaskedLM             | 64 | 0.9996 |   0.899   |     0.3629     |     nan     |  0.9811  |         1.0365         |
|     PegasusForConditionalGeneration     | 4  | 0.9994 |  0.9194   |     0.4621     |     nan     |  0.7686  |         1.0358         |
|           LayoutLMForMaskedLM           | 16 | 0.9999 |  0.9238   |     0.3549     |     nan     |  0.9871  |         1.0264         |
|     PLBartForConditionalGeneration      | 8  | 0.9975 |  0.8294   |     0.3984     |     nan     |  0.8438  |         1.0221         |
|             OPTForCausalLM              | 4  | 0.9974 |   0.75    |     0.3898     |     nan     |  0.8483  |         1.019          |
|            TrOCRForCausalLM             | 8  |  1.0   |  0.7955   |      nan       |     nan     |  0.8774  |         1.0171         |
|          DistilBertForMaskedLM          | 16 | 0.9986 |  0.8686   |     0.3662     |     nan     |  0.9164  |         1.0168         |
|            PLBartForCausalLM            | 16 | 1.0001 |  0.8666   |     0.3854     |     nan     |  0.9395  |         1.013          |
|           ElectraForCausalLM            | 1  | 0.9993 |  0.8955   |     0.3766     |     nan     |  0.6701  |         1.011          |
|            XLNetLMHeadModel             | 4  | 0.9912 |  0.8791   |      nan       |     nan     |  1.0109  |         1.0109         |
|                CamemBert                | 1  | 0.9989 |  0.7872   |     0.4083     |     nan     |  0.8654  |         1.0095         |
|     M2M100ForConditionalGeneration      | 2  | 0.9997 |  0.9659   |     0.5099     |     nan     |  0.7118  |         1.0048         |
|             XGLMForCausalLM             | 1  |  1.0   |   0.999   |      nan       |     nan     |  0.7913  |          1.0           |
|            YituTechConvBert             | 1  | 0.9718 |  0.7819   |      nan       |     nan     |  0.8618  |         0.9718         |
|           RobertaForCausalLM            | 4  | 0.9237 |  0.7741   |     0.4183     |     nan     |  0.8574  |         0.9237         |
|    MegatronBertForQuestionAnswering     | 8  | 0.9051 |  0.8218   |     0.4331     |     nan     |  0.8434  |         0.9051         |
|     MobileBertForQuestionAnswering      | 32 | 1.0142 |  0.9796   |      nan       |     nan     |  0.6265  |         0.8395         |
|         MegatronBertForCausalLM         | 2  | 0.7726 |  0.7726   |     0.4464     |     nan     |  0.7726  |         0.7726         |
|       MT5ForConditionalGeneration       | 2  | 0.6019 |  0.6019   |      nan       |     nan     |  0.6019  |         0.6019         |
|          AllenaiLongformerBase          | 0  |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------------+----+--------+-----------+----------------+-------------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|        tnt_s_patch16_224        | 64  | 0.9998 |   0.996   |      0.0       |   1.8927    |  2.0669  |         2.0166         |
|      xcit_large_24_p8_224       |  5  | 1.0001 |    0.0    |      0.0       |     0.0     |  2.1732  |         1.7693         |
|          cait_m36_384           |  2  | 1.0031 |  0.8444   |      0.0       |   1.3666    |  2.5088  |         1.7267         |
|          ghostnet_100           | 128 | 1.0023 |  0.9967   |     0.8918     |   1.5678    |  2.0488  |         1.7165         |
|            lcnet_050            | 128 | 0.9697 |  0.9507   |     0.8689     |   1.6016    |  2.0068  |         1.6265         |
|            nfnet_l0             | 64  | 1.008  |  0.8316   |     0.8087     |   1.1327    |  1.7737  |         1.6092         |
|          gmixer_24_224          | 64  | 0.9991 |  0.8834   |      0.0       |   1.0025    |  1.665   |         1.6079         |
|           resnest101e           | 32  | 1.0029 |  0.9792   |     1.1035     |    1.427    |  2.3567  |         1.5816         |
|           volo_d1_224           | 64  | 0.9999 |  0.9942   |      0.0       |   1.1417    |  1.6044  |         1.5643         |
|        twins_pcpvt_base         | 32  | 1.0042 |  0.8954   |     1.3418     |    1.342    |  2.4281  |         1.5564         |
|         crossvit_9_240          | 64  | 1.0069 |   0.961   |     0.8776     |   1.1151    |  1.5854  |         1.5195         |
|  swin_base_patch4_window7_224   | 64  | 1.0001 |   0.961   |      0.0       |   1.0368    |  1.5142  |         1.5033         |
|             dla102              | 64  | 1.0002 |  0.9865   |     0.8131     |   1.3833    |  1.5367  |         1.4816         |
|       gluon_inception_v3        | 128 | 0.9999 |  0.9963   |     0.8525     |   1.1958    |  1.5059  |         1.473          |
|          inception_v3           | 128 |  1.0   |  0.9966   |     0.8527     |   1.1968    |  1.4976  |         1.4686         |
|        adv_inception_v3         | 128 |  1.0   |  0.9966   |     0.8534     |   1.1962    |  1.5057  |         1.4672         |
|           mnasnet_100           | 128 | 0.9546 |  0.9419   |     0.787      |   1.3716    |  1.4338  |         1.4598         |
|           regnety_002           | 128 | 0.9774 |  0.9311   |     1.1255     |   1.3889    |  2.0652  |         1.4516         |
|      mobilenetv3_large_100      | 128 | 0.9562 |   0.945   |     0.7833     |   1.3447    |  1.4623  |         1.4415         |
|         mobilenetv2_100         | 128 | 0.9514 |  0.9415   |     0.7209     |   0.8657    |  1.3996  |         1.4347         |
|           dm_nfnet_f0           | 128 | 0.9979 |  0.9994   |      0.0       |   1.1787    |  1.5001  |         1.425          |
|           mobilevit_s           | 32  | 0.9757 |  0.8152   |     0.7906     |   1.2165    |  1.6643  |         1.4127         |
|            fbnetv3_b            | 128 | 0.9531 |  0.9407   |     0.8011     |   1.2581    |  1.3993  |         1.4007         |
|          spnasnet_100           | 128 | 0.9474 |  0.9363   |     0.7762     |   1.3173    |  1.3748  |         1.3962         |
|           selecsls42b           | 128 | 0.9998 |  0.9957   |     0.8407     |   1.3536    |  1.4213  |         1.3928         |
|         coat_lite_mini          | 128 | 0.9999 |  0.9887   |     0.842      |   1.2191    |  1.4288  |         1.3914         |
|           fbnetc_100            | 128 | 0.9535 |  0.9438   |     0.7916     |   1.3691    |  1.3581  |         1.3752         |
|          resmlp_12_224          | 128 | 1.0001 |  0.9978   |     0.7826     |     0.0     |  1.4146  |         1.3686         |
|          jx_nest_base           | 32  | 0.9998 |  0.9932   |      0.0       |   1.2254    |  1.4003  |         1.3651         |
|        ese_vovnet19b_dw         | 128 | 0.9703 |  0.9622   |     0.7669     |   1.2466    |  1.3586  |         1.3599         |
|       tf_efficientnet_b0        | 128 | 0.966  |  0.8081   |     0.6673     |   1.0976    |  1.3531  |         1.3554         |
|            pit_b_224            | 64  | 0.9998 |  0.9949   |     0.8216     |   1.0623    |  1.3594  |         1.3516         |
|        res2net101_26w_4s        | 64  | 1.0038 |  0.9889   |     0.9597     |   1.4095    |  1.6442  |         1.3485         |
|          cspdarknet53           | 64  | 0.9431 |  0.9342   |     0.7564     |   0.9036    |  1.3312  |         1.3469         |
|          botnet26t_256          | 128 |  0.98  |  0.9733   |     0.8127     |   1.3475    |  1.316   |         1.3277         |
|            hrnet_w18            |  2  | 1.0054 |  0.9669   |     2.3282     |   1.3989    |  5.0848  |         1.3048         |
|           res2next50            |  2  | 1.0025 |  0.9057   |     2.2905     |   1.3259    |  5.4981  |         1.3024         |
|        res2net50_14w_8s         |  2  | 1.0013 |  0.9122   |     2.3278     |   1.3861    |  5.8804  |         1.2998         |
|         poolformer_m36          | 64  | 0.9999 |  0.9981   |     0.8065     |     0.0     |  1.3293  |         1.2964         |
|           convit_base           | 32  | 0.9997 |  0.9929   |      0.0       |     0.0     |   1.34   |         1.2899         |
|           rexnet_100            | 128 | 0.9651 |  0.8514   |     0.6903     |   1.0315    |  1.2792  |         1.2774         |
|            tinynet_a            | 128 | 0.9716 |  0.8026   |     0.6506     |   1.0894    |  1.2557  |         1.2721         |
|          pnasnet5large          | 16  | 1.0049 |  1.0268   |     0.8387     |   1.1277    |  1.2934  |         1.2655         |
|      beit_base_patch16_224      | 64  | 1.0001 |  0.9792   |      0.0       |   1.0445    |  1.2859  |         1.2652         |
| deit_base_distilled_patch16_224 | 64  | 0.9999 |  0.9926   |     0.7957     |   1.0626    |  1.2828  |         1.2638         |
|          mixer_b16_224          | 64  | 1.0002 |  0.9921   |     0.7967     |   0.9572    |  1.285   |         1.2435         |
|       eca_botnext26ts_256       | 64  | 0.962  |  0.8009   |     0.658      |   1.1046    |  1.2516  |         1.2415         |
|        sebotnet33ts_256         | 64  | 0.9666 |  0.8371   |     0.6809     |   1.1174    |  1.2055  |         1.2087         |
|            mixnet_l             | 64  | 0.9815 |  0.8872   |     0.8056     |   1.1028    |  1.2378  |         1.1824         |
|           tf_mixnet_l           | 64  | 0.9839 |  0.9022   |     0.7936     |   1.0722    |  1.2388  |         1.1788         |
|      vit_base_patch16_224       | 64  |  1.0   |   0.994   |     0.8351     |   0.9939    |  1.196   |         1.1779         |
|         visformer_small         | 128 | 0.9999 |  1.0019   |     0.8429     |   1.0838    |  1.2356  |         1.1765         |
|        eca_halonext26ts         | 64  | 0.9638 |  0.8036   |     0.6645     |    1.102    |   0.0    |         1.1754         |
|             dpn107              | 32  | 0.9366 |  0.9289   |     0.7469     |   0.9873    |  1.1501  |          1.17          |
|            repvgg_a2            | 128 | 0.9439 |  0.9357   |     0.7981     |   1.1328    |  1.142   |         1.1575         |
|          gmlp_s16_224           | 64  |  1.0   |  0.9848   |      0.0       |   1.0463    |  1.3405  |         1.1304         |
|        gluon_xception65         | 32  | 1.0001 |  0.9898   |     0.7537     |    1.065    |  1.1607  |         1.1273         |
|            gernet_l             | 128 | 0.947  |   0.936   |     0.7687     |    1.143    |  1.0697  |         1.0773         |
|     swsl_resnext101_32x16d      | 32  | 0.9996 |  0.9811   |     0.8072     |   1.0762    |  1.1379  |         1.0588         |
|        convmixer_768_32         | 32  | 0.9999 |  0.9983   |     0.923      |   1.0533    |  1.0553  |         1.051          |
|          convnext_base          | 32  | 1.0087 |  0.9445   |      0.0       |   1.3707    |  0.7366  |         0.7146         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+
|        adv_inception_v3         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          botnet26t_256          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          cspdarknet53           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|             dla102              | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|             dpn107              | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            gernet_l             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          ghostnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            hrnet_w18            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          inception_v3           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            lcnet_050            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            mixnet_l             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         mobilenetv2_100         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           mobilevit_s           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            nfnet_l0             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          pnasnet5large          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           regnety_002           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           res2next50            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           rexnet_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        sebotnet33ts_256         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           selecsls42b           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            tinynet_a            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         visformer_small         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |          pass          |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|      xcit_large_24_p8_224       | 2  | pass  |  fail_to_run  |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|         poolformer_m36          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|           resnest101e           | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|          gmixer_24_224          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|          gmlp_s16_224           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|          jx_nest_base           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|         coat_lite_mini          | 2  | pass  | fail_accuracy | fail_accuracy  | fail_accuracy |     pass      |          pass          |
|            pit_b_224            | 2  | pass  | fail_accuracy | fail_accuracy  | fail_accuracy |     pass      |          pass          |
|        twins_pcpvt_base         | 2  | pass  | fail_accuracy | fail_accuracy  | fail_accuracy |     pass      |          pass          |
|        eca_halonext26ts         | 2  | pass  |     pass      |      pass      |     pass      |  fail_to_run  |      fail_to_run       |
|        gluon_xception65         | 2  | pass  |     pass      |      pass      |     pass      | fail_accuracy |     fail_accuracy      |
|            fbnetv3_b            | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|        eca_halonext26ts         | 64  | 1.435  |  6.1509   |    12.4327     |   65.9608   |   nan    |        220.5118        |
|            hrnet_w18            |  2  | 6.1168 |  38.0167  |     66.361     |  371.0315   | 107.8157 |        116.4564        |
|          pnasnet5large          | 16  | 5.0919 |  27.3988  |    48.2146     |  189.0787   | 85.2776  |         80.574         |
|      xcit_large_24_p8_224       |  5  | 3.0668 |    nan    |      nan       |     nan     | 193.9676 |        68.8234         |
|          cait_m36_384           |  2  | 3.4055 |  24.6768  |      nan       |   61.1007   | 178.1442 |        62.3835         |
|        res2net101_26w_4s        | 64  | 2.9449 |  20.561   |    33.6122     |  118.2269   | 68.2382  |        60.6978         |
|           resnest101e           | 32  | 3.1821 |  20.4609  |    32.7293     |   101.181   | 142.8688 |        55.1254         |
|        twins_pcpvt_base         | 32  | 2.9751 |  19.4392  |    30.1407     |   72.0773   | 570.865  |        54.9535         |
|  swin_base_patch4_window7_224   | 64  | 2.9795 |  16.0702  |      nan       |   71.9607   | 184.5472 |        51.8449         |
|        res2net50_14w_8s         |  2  | 2.7966 |  18.1464  |    28.7877     |  103.2297   | 53.9375  |        50.1289         |
|         poolformer_m36          | 64  | 1.9704 |  10.502   |    16.1098     |     nan     | 50.2077  |        46.5752         |
|             dpn107              | 32  | 3.8719 |  16.8405  |    51.1831     |   99.5596   | 46.5371  |        43.1415         |
|           mobilevit_s           | 32  | 1.8834 |  9.0856   |    17.8444     |   57.3713   | 435.559  |        42.5975         |
|          convnext_base          | 32  | 1.5402 |   8.918   |      nan       |   36.8594   | 233.7388 |        42.2959         |
|          jx_nest_base           | 32  | 1.8299 |  11.9719  |      nan       |   50.7765   | 154.2262 |        40.9813         |
|            fbnetv3_b            | 128 | 3.1286 |  13.1082  |    35.9453     |   99.8172   | 42.4427  |        39.2793         |
|        adv_inception_v3         | 128 | 1.6951 |  10.7702  |    16.0534     |   98.6053   | 38.8197  |        36.9292         |
|        gluon_xception65         | 32  | 1.9051 |  13.0636  |    20.5942     |   63.092    | 39.6838  |        36.6068         |
|        tnt_s_patch16_224        | 64  | 1.9765 |  13.7438  |      nan       |   38.108    | 79.3178  |         36.349         |
|           tf_mixnet_l           | 64  | 6.0485 |  15.4409  |    31.3271     |   80.4989   | 39.2934  |        35.3401         |
|       gluon_inception_v3        | 128 | 1.6463 |  10.6789  |    16.1246     |   99.5953   | 38.7047  |        35.3094         |
|          inception_v3           | 128 | 1.6443 |  10.7318  |    16.0762     |  100.0787   | 40.7121  |        35.1945         |
|            mixnet_l             | 64  | 5.4554 |  14.3232  |    31.9029     |   80.7908   |  37.772  |        34.7109         |
|          ghostnet_100           | 128 | 2.8074 |  11.4277  |    16.0482     |   90.9446   | 37.1004  |         34.594         |
|             dla102              | 64  | 1.7372 |  11.9857  |    18.0077     |   85.2083   | 37.8686  |        34.4964         |
|          gmlp_s16_224           | 64  | 1.358  |  9.3349   |      nan       |   22.0471   | 94.7536  |        33.1031         |
|           volo_d1_224           | 64  | 1.4035 |  9.6409   |      nan       |   39.0205   | 99.4593  |        31.3828         |
|     swsl_resnext101_32x16d      | 32  | 1.8603 |  12.0047  |    17.7091     |   51.8964   | 33.0098  |        30.0827         |
|           dm_nfnet_f0           | 128 | 2.0123 |  8.5217   |      nan       |   37.1719   | 33.4232  |         29.788         |
|           res2next50            |  2  | 1.6902 |  10.5555  |    14.9018     |   58.4687   | 31.3378  |        29.1112         |
|         crossvit_9_240          | 64  | 1.7674 |  10.7968  |    15.9577     |   36.4229   | 177.7272 |        28.7388         |
|           rexnet_100            | 128 | 1.9778 |  8.8454   |     20.415     |  116.8786   | 30.9752  |        28.5034         |
|            tinynet_a            | 128 | 2.1854 |  9.7501   |    23.1673     |   78.7599   | 30.6411  |         27.887         |
|        sebotnet33ts_256         | 64  | 1.8121 |  7.2766   |    16.3418     |   67.8501   | 200.1795 |        27.2786         |
|          gmixer_24_224          | 64  | 1.543  |  10.4071  |      nan       |   28.2553   | 78.5994  |        24.9947         |
|          cspdarknet53           | 64  | 2.4964 |  8.9171   |    22.4576     |   39.8907   | 27.2623  |        24.7776         |
|       tf_efficientnet_b0        | 128 | 1.9292 |  8.2259   |    18.8205     |   77.8078   | 26.1743  |         23.675         |
|           fbnetc_100            | 128 | 2.1137 |  8.3903   |    20.2628     |   59.9796   | 25.3075  |        22.7453         |
|         coat_lite_mini          | 128 | 1.101  |    6.8    |    10.0239     |   32.4766   | 402.4657 |        22.6502         |
|          spnasnet_100           | 128 | 2.065  |  7.8044   |    20.1666     |   56.6263   | 24.9484  |        22.0166         |
|           convit_base           | 32  | 1.2693 |  7.6327   |      nan       |     nan     | 96.6373  |        21.9034         |
|       eca_botnext26ts_256       | 64  | 1.4337 |  5.6354   |    12.0384     |   63.0366   | 365.2177 |        21.8795         |
|      mobilenetv3_large_100      | 128 | 1.644  |  6.7536   |    15.1964     |   82.4359   | 23.6085  |        21.0345         |
|            nfnet_l0             | 64  | 1.7975 |  8.4924   |    12.6507     |   34.4242   | 26.6382  |        20.7187         |
|          botnet26t_256          | 128 | 1.3584 |  5.2753   |    11.1036     |   49.2952   | 189.7464 |        20.3718         |
|        convmixer_768_32         | 32  | 1.3199 |  7.9233   |    11.8415     |   17.8022   | 25.4867  |        19.8285         |
|           regnety_002           | 128 | 1.6717 |  7.1936   |    16.0403     |   56.1349   | 20.9255  |        19.1329         |
|         mobilenetv2_100         | 128 | 1.7791 |   6.384   |    15.2861     |   40.5512   | 21.9773  |        19.0758         |
|            gernet_l             | 128 | 2.0374 |   7.463   |     18.659     |   44.8025   | 20.9234  |        19.0706         |
|           mnasnet_100           | 128 | 1.6803 |  6.5951   |    15.5774     |   50.207    | 20.9103  |         18.743         |
|      beit_base_patch16_224      | 64  | 1.2479 |  7.0291   |      nan       |   18.1129   | 38.3457  |        18.6593         |
|            repvgg_a2            | 128 | 2.0414 |  7.0386   |    17.5579     |   61.292    | 20.4612  |        18.3748         |
|            pit_b_224            | 64  | 1.0952 |  6.9745   |    10.0976     |   25.4079   | 87.9129  |        18.2073         |
|         visformer_small         | 128 | 0.922  |  4.9434   |     7.4202     |   30.5163   | 93.8021  |        17.8298         |
| deit_base_distilled_patch16_224 | 64  | 0.8452 |  6.0886   |     8.8188     |   14.258    | 40.8733  |        17.5893         |
|          resmlp_12_224          | 128 | 0.6117 |  3.9529   |     7.3898     |     nan     |  41.942  |        17.1676         |
|           selecsls42b           | 128 | 0.7102 |  4.7117   |     6.9379     |   50.2083   |  19.065  |        16.9864         |
|      vit_base_patch16_224       | 64  | 0.9454 |  5.9644   |     8.425      |   13.8959   | 29.1978  |        16.8973         |
|          mixer_b16_224          | 64  | 0.6924 |  4.6145   |     7.8052     |   16.1793   | 43.3383  |        16.4071         |
|            lcnet_050            | 128 | 1.0362 |  3.9688   |     8.4529     |   38.2322   | 14.9745  |        13.6927         |
|        ese_vovnet19b_dw         | 128 | 1.026  |  3.9109   |     7.9181     |   38.5395   | 14.9488  |        13.1984         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|            tinynet_a            | 128 | 0.9889 |  0.7884   |     0.2766     |   0.7887    |  1.3707  |         1.4015         |
|          gmixer_24_224          | 64  | 0.9922 |  0.9494   |      nan       |   0.8991    |  1.2587  |         1.3627         |
|          gmlp_s16_224           | 64  | 0.9939 |  0.9623   |      nan       |    0.92     |  1.2405  |         1.3565         |
|          pnasnet5large          | 16  | 1.0575 |  0.9913   |     0.3633     |   1.1722    |  1.1607  |         1.2789         |
|        sebotnet33ts_256         | 64  | 0.9928 |  0.7073   |     0.3213     |   0.7354    |  0.745   |         1.2732         |
|            pit_b_224            | 64  | 0.999  |  0.8053   |     0.326      |   0.8179    |  0.9746  |         1.2244         |
|           mobilevit_s           | 32  | 0.9926 |  0.7681   |     0.2757     |    0.787    |  1.1122  |         1.2217         |
|       eca_botnext26ts_256       | 64  | 0.989  |  0.7706   |     0.2697     |   0.7788    |  1.1084  |         1.2042         |
|        eca_halonext26ts         | 64  | 0.9885 |   0.775   |     0.2697     |   0.7792    |   nan    |         1.1991         |
|       tf_efficientnet_b0        | 128 | 0.9882 |  0.7693   |     0.2664     |   0.8392    |  1.173   |         1.1918         |
|           convit_base           | 32  | 0.9972 |  0.8582   |      nan       |     nan     |  1.0248  |         1.1823         |
|           rexnet_100            | 128 | 0.9885 |   0.785   |     0.2849     |   0.8648    |  1.1475  |         1.1687         |
|        tnt_s_patch16_224        | 64  | 0.9948 |  0.9668   |      nan       |   0.9431    |  1.0469  |          1.16          |
|           dm_nfnet_f0           | 128 | 0.969  |   0.898   |      nan       |   0.9443    |  1.0336  |         1.124          |
|         poolformer_m36          | 64  | 0.9979 |  0.9432   |     0.3413     |     nan     |  1.1022  |         1.1162         |
|         crossvit_9_240          | 64  | 0.9874 |  0.8698   |     0.3378     |   0.8854    |  0.7934  |         1.0957         |
|      beit_base_patch16_224      | 64  | 0.9952 |  0.9327   |      nan       |   0.9298    |  1.0004  |         1.0937         |
|        twins_pcpvt_base         | 32  | 0.9938 |  0.9046   |     0.3492     |   0.8007    |  0.9337  |         1.0923         |
|             dla102              | 64  | 0.9931 |  0.9487   |     0.3592     |   0.9751    |  1.079   |         1.0867         |
| deit_base_distilled_patch16_224 | 64  | 0.9944 |  0.9332   |     0.359      |   0.8794    |  0.8911  |         1.0785         |
|      vit_base_patch16_224       | 64  | 0.9955 |  0.9342   |     0.3593     |   0.8801    |  0.8916  |         1.0772         |
|           volo_d1_224           | 64  | 0.9965 |  0.9475   |      nan       |   0.8587    |  1.0138  |         1.0716         |
|            nfnet_l0             | 64  | 0.9884 |  0.8166   |     0.2786     |   0.8207    |  1.0034  |         1.0713         |
|         visformer_small         | 128 | 0.9899 |  0.9259   |     0.3469     |   0.8884    |  0.9382  |         1.0646         |
|           resnest101e           | 32  | 0.9955 |  0.9721   |     0.3558     |   0.9532    |  1.0272  |         1.058          |
|          ghostnet_100           | 128 | 0.9756 |   0.87    |     0.3371     |   0.9026    |  0.9897  |         1.0571         |
|         coat_lite_mini          | 128 | 1.0338 |  0.9202   |     0.3514     |   0.6593    |  0.7962  |         1.0496         |
|           tf_mixnet_l           | 64  | 0.9903 |  0.8556   |     0.2894     |   0.8366    |  0.9291  |         1.0459         |
|         mobilenetv2_100         | 128 | 0.9863 |  0.7642   |     0.3109     |   0.9129    |  1.0048  |         1.021          |
|           selecsls42b           | 128 | 0.9789 |   0.876   |     0.3528     |   0.8772    |  0.9715  |         1.0173         |
|      xcit_large_24_p8_224       |  5  | 0.9975 |    nan    |      nan       |     nan     |  0.9289  |         1.014          |
|          resmlp_12_224          | 128 | 0.9827 |  0.9508   |     0.2624     |     nan     |  0.8092  |         1.0011         |
|          cait_m36_384           |  2  | 0.9993 |  0.8803   |      nan       |    0.903    |  0.8949  |         0.9997         |
|          mixer_b16_224          | 64  | 0.9929 |  0.9361   |     0.3571     |   0.7726    |  0.8978  |         0.9895         |
|        convmixer_768_32         | 32  | 0.9972 |  0.9788   |     0.3455     |   0.9714    |  0.9746  |         0.9846         |
|        res2net50_14w_8s         |  2  | 0.9968 |   0.824   |     0.4257     |   0.8169    |  0.8228  |         0.9804         |
|           res2next50            |  2  | 0.9976 |  0.8277   |     0.4221     |   0.8198    |  0.8231  |         0.979          |
|            fbnetv3_b            | 128 | 0.9872 |  0.7836   |     0.3151     |    0.79     |  0.9645  |         0.9776         |
|        ese_vovnet19b_dw         | 128 | 0.9858 |  0.8566   |     0.3273     |   0.9146    |  0.9605  |         0.9746         |
|            mixnet_l             | 64  |  0.99  |  0.8439   |     0.2738     |   0.7742    |  0.8647  |         0.9708         |
|          convnext_base          | 32  | 1.0034 |  0.9053   |      nan       |   0.7521    |  0.8848  |         0.9666         |
|             dpn107              | 32  | 0.997  |  0.9097   |     0.3531     |   0.8814    |  0.9075  |         0.9593         |
|     swsl_resnext101_32x16d      | 32  | 0.9989 |   0.879   |     0.3676     |   0.8487    |  0.9112  |         0.9354         |
|  swin_base_patch4_window7_224   | 64  | 0.9966 |  0.9203   |      nan       |   0.8451    |  0.7566  |         0.9238         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9151   |     0.3336     |   0.8524    |  0.8964  |         0.9224         |
|      mobilenetv3_large_100      | 128 | 0.9772 |   0.84    |     0.3301     |   0.8641    |  0.8948  |         0.916          |
|        adv_inception_v3         | 128 | 0.9824 |  0.8621   |     0.3343     |   0.8538    |  0.8845  |         0.8998         |
|          inception_v3           | 128 | 0.9824 |  0.8621   |     0.3343     |   0.8538    |  0.8845  |         0.8998         |
|       gluon_inception_v3        | 128 | 0.9824 |  0.8621   |     0.3342     |   0.8538    |  0.8845  |         0.8998         |
|          botnet26t_256          | 128 | 0.9849 |   0.864   |     0.3308     |   0.7708    |  0.8503  |         0.898          |
|        gluon_xception65         | 32  | 0.9955 |  0.8859   |     0.3349     |   0.8854    |  0.8924  |         0.8971         |
|            gernet_l             | 128 | 0.9794 |  0.8503   |     0.3444     |   0.8158    |  0.8621  |         0.8897         |
|          spnasnet_100           | 128 | 0.9788 |  0.8801   |     0.3344     |   0.8371    |  0.8602  |         0.8784         |
|            lcnet_050            | 128 | 0.9433 |  0.7566   |     0.3361     |   0.7559    |  0.8309  |         0.8769         |
|          jx_nest_base           | 32  | 0.9983 |  0.8927   |      nan       |    0.86     |  0.6708  |         0.8749         |
|           mnasnet_100           | 128 | 0.9765 |  0.8701   |     0.3348     |   0.8252    |  0.8503  |         0.8698         |
|           regnety_002           | 128 | 0.9504 |  0.7948   |     0.3403     |   0.7515    |  0.8245  |         0.8627         |
|          cspdarknet53           | 64  | 0.9915 |  0.8407   |     0.3241     |   0.7908    |  0.8512  |         0.8583         |
|           fbnetc_100            | 128 |  0.98  |  0.8491   |     0.3307     |   0.7352    |  0.8387  |         0.8542         |
|            repvgg_a2            | 128 | 0.9767 |  0.7822   |     0.3406     |   0.6789    |  0.7905  |         0.8278         |
|            hrnet_w18            |  2  | 0.9971 |  0.8333   |     0.4258     |   0.8355    |  0.8367  |         0.6644         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 89%, 49/55 | 98%, 42/43  | 100%, 61/61 |
|       aot_eager        | 89%, 49/55 | 98%, 42/43  | 97%, 59/61  |
|     aot_cudagraphs     | 73%, 40/55 | 49%, 21/43  | 38%, 23/61  |
|      aot_nvfuser       | 58%, 32/55 |  2%, 1/43   | 87%, 53/61  |
|        inductor        | 85%, 47/55 | 93%, 40/43  | 97%, 59/61  |
| inductor_no_cudagraphs | 91%, 50/55 | 93%, 40/43  | 95%, 58/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.01x    |    1.00x    |
|       aot_eager        |   1.01x    |    1.00x    |    1.00x    |
|     aot_cudagraphs     |   1.09x    |    1.02x    |    1.00x    |
|      aot_nvfuser       |   1.13x    |    1.12x    |    1.11x    |
|        inductor        |   1.50x    |    1.31x    |    1.26x    |
| inductor_no_cudagraphs |   1.23x    |    1.21x    |    1.25x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    1.78    |    2.15     |    1.94     |
|       aot_eager        |    6.47    |    9.29     |    9.22     |
|     aot_cudagraphs     |    6.79    |    12.09    |    16.48    |
|      aot_nvfuser       |   20.61    |    9.84     |    51.45    |
|        inductor        |   62.14    |    53.79    |    73.53    |
| inductor_no_cudagraphs |   61.41    |    48.85    |    72.55    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.96x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.91x    |    0.88x    |
|     aot_cudagraphs     |   0.39x    |    0.35x    |    0.32x    |
|      aot_nvfuser       |   0.83x    |    1.08x    |    0.84x    |
|        inductor        |   0.84x    |    0.79x    |    0.96x    |
| inductor_no_cudagraphs |   0.93x    |    0.96x    |    1.01x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|            densenet121            |  4   | 1.002  |  0.9978   |     2.3409     |   1.4467    |  5.1622  |         1.2963         |
|         timm_efficientdet         |  1   | 0.979  |  0.8852   |      0.0       |     0.0     |  4.3858  |         1.5765         |
|       functorch_dp_cifar10        |  64  | 1.0026 |  0.9744   |     1.9559     |   1.1928    |  3.7282  |         1.2617         |
|      timm_vision_transformer      |  8   | 1.0086 |  0.9257   |     1.4644     |   1.3478    |  2.6512  |         1.4226         |
|                drq                |  1   | 1.0126 |  0.8387   |     1.6455     |   1.0639    |  2.4863  |         1.0927         |
|        mobilenet_v3_large         |  32  | 1.0056 |   1.107   |     1.0133     |    1.377    |  2.0995  |         1.3533         |
|          resnext50_32x4d          |  8   | 1.0026 |  1.0844   |     1.1689     |   1.3748    |  2.0143  |         1.2126         |
|           BERT_pytorch            |  16  | 1.0102 |   0.877   |      0.0       |     0.0     |  1.9299  |         1.8929         |
|          pytorch_struct           | 200  | 0.9985 |  0.7879   |     0.8584     |   0.8896    |  1.836   |         1.1984         |
|           lennard_jones           | 1000 | 0.9749 |  0.8189   |     1.075      |   1.0247    |  1.8158  |         0.9447         |
|             resnet18              |  16  | 1.0028 |  1.0909   |     1.2494     |   1.3811    |  1.8013  |         1.2601         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9988 |  0.9368   |     1.2228     |   1.1906    |  1.7624  |         1.3194         |
|           squeezenet1_1           |  32  | 0.9955 |  0.9905   |     1.0378     |   1.1552    |  1.7426  |         1.2754         |
|             hf_Albert             |  8   | 1.0015 |  0.9973   |     0.7452     |     0.0     |  1.6602  |         1.6555         |
|               dcgan               |  32  | 0.9837 |   1.008   |     1.2562     |   1.1377    |  1.6441  |         1.0771         |
|            hf_T5_large            |  2   | 1.0249 |  0.8892   |      0.0       |     0.0     |  1.6282  |         1.5773         |
|        speech_transformer         |  32  | 1.0064 |  0.8979   |      0.0       |     0.0     |  1.5722  |         1.5391         |
|         soft_actor_critic         | 256  | 0.9761 |  0.7772   |     1.0678     |   0.9922    |  1.5653  |         0.9507         |
|        shufflenet_v2_x1_0         | 128  | 1.0006 |  1.0507   |     0.813      |   1.1896    |  1.5405  |         1.3842         |
|           timm_resnest            |  32  | 0.9995 |  1.0025   |     0.8052     |   1.1829    |  1.5228  |         1.4543         |
|            timm_nfnet             | 128  | 0.9994 |  1.0003   |      0.0       |   1.2123    |  1.4689  |         1.4212         |
|            mnasnet1_0             |  32  | 1.0006 |   1.088   |     0.8592     |   1.3001    |  1.4623  |         1.2718         |
|              hf_GPT2              |  4   | 1.0115 |  0.9802   |     0.7295     |     0.0     |  1.4388  |         1.4355         |
|           mobilenet_v2            |  96  | 0.9999 |  0.9955   |     0.7297     |   1.0445    |  1.4283  |         1.408          |
|           fastNLP_Bert            |  6   | 0.999  |  0.9684   |     0.7536     |     0.0     |  1.3722  |         1.3446         |
|         timm_efficientnet         |  32  | 0.9556 |  0.8062   |     0.6823     |   1.0603    |  1.2967  |         1.2046         |
|          LearningToPaint          |  96  | 1.0004 |  1.0523   |     0.8579     |   1.2275    |  1.251   |         1.208          |
|              hf_Bart              |  4   | 1.0139 |  0.9742   |     0.726      |     0.0     |  1.2383  |         1.1758         |
|             resnet50              |  32  | 0.9994 |  0.9942   |     0.7614     |    1.163    |  1.2065  |         1.1694         |
|           pytorch_unet            |  1   | 0.9996 |  0.9976   |     0.8458     |    1.076    |  1.2001  |         1.1857         |
|            Super_SloMo            |  6   | 1.0001 |  0.9973   |     0.8672     |     0.0     |  1.1792  |         1.1644         |
|               vgg16               |  64  | 0.9998 |  0.9989   |     0.8582     |   0.9973    |  1.1729  |         1.1659         |
|              alexnet              | 128  | 0.9993 |   0.998   |     0.8026     |   1.0005    |  1.162   |         1.1631         |
|              hf_Bert              |  4   | 1.0263 |  0.9917   |     0.7132     |     0.0     |  1.1598  |         1.1563         |
|           hf_DistilBert           |  8   | 1.0008 |  0.9562   |     0.6702     |     0.0     |  1.1516  |         1.1614         |
|            timm_regnet            |  32  | 0.9654 |  0.9637   |     0.7808     |    1.095    |  1.129   |         1.0942         |
|          pytorch_stargan          |  16  | 0.9988 |  0.9833   |     0.8651     |   0.9889    |  1.1239  |         1.0911         |
|        Background_Matting         |  4   | 1.0004 |  1.0228   |     0.8681     |   1.0822    |  1.1156  |         1.1072         |
|            hf_Reformer            |  4   | 0.9966 |    0.0    |     0.9273     |     0.0     |  1.1099  |         1.1339         |
|            hf_BigBird             |  2   | 0.9955 |  0.9411   |     0.9488     |     0.0     |  1.1021  |         1.0022         |
|              yolov3               |  16  | 0.9999 |  0.9947   |     0.7916     |   1.1843    |  1.079   |         1.0653         |
|   timm_vision_transformer_large   |  8   |  1.0   |   0.993   |      0.0       |   0.9826    |  1.0544  |         1.0394         |
| attention_is_all_you_need_pytorch | 256  | 1.0001 |  0.9736   |      0.0       |     0.0     |  1.0438  |         1.0293         |
|            timm_vovnet            |  32  | 0.9113 |  0.9041   |     0.715      |   0.9779    |  1.0066  |         1.0299         |
|              demucs               |  4   | 1.0003 |    1.0    |     1.0001     |   0.9995    |  1.0004  |         1.0006         |
|            tts_angular            |  64  | 0.9795 |  0.9612   |     0.9842     |    0.993    |  0.9987  |         1.009          |
|      nvidia_deeprecommender       | 256  | 0.9997 |  0.9632   |     0.5851     |   0.9435    |  0.904   |         0.964          |
|               dlrm                | 2048 |  0.0   |  1.0787   |      0.0       |     0.0     |   0.0    |         1.2115         |
|           hf_GPT2_large           |  4   | 1.0005 |  0.9803   |      0.0       |     0.0     |   0.0    |         1.3857         |
|               hf_T5               |  8   | 1.0017 |  0.9911   |      0.0       |     0.0     |   0.0    |         1.5038         |
|           hf_Longformer           |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|    mobilenet_v2_quantized_qat     |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|      resnet50_quantized_qat       |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|              hf_Bert              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|      timm_vision_transformer      |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |          pass          |
|            Super_SloMo            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           fastNLP_Bert            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|             hf_Albert             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bart              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_BigBird             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            timm_regnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              yolov3               |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|           hf_DistilBert           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           timm_resnest            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        Background_Matting         |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|    mobilenet_v2_quantized_qat     |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|      resnet50_quantized_qat       |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|             tacotron2             |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|         timm_efficientdet         |  1   | 19.5002 |  38.2722  |      nan       |     nan     | 459.5236 |        476.2338        |
|              yolov3               |  16  | 2.8733  |  8.9076   |    12.2564     |   44.9771   | 413.9688 |        410.5355        |
|            hf_T5_large            |  2   | 12.6314 |  41.5885  |      nan       |     nan     | 205.4873 |        201.7729        |
|      timm_vision_transformer      |  8   | 0.7856  |  4.5736   |     5.9599     |   9.5398    | 153.4416 |        160.3076        |
|        speech_transformer         |  32  | 1.6279  |  8.4704   |      nan       |     nan     | 152.9591 |        148.9265        |
|           timm_resnest            |  32  | 0.5492  |  2.6896   |     3.9212     |   35.7153   | 147.2225 |        142.5265        |
| attention_is_all_you_need_pytorch | 256  | 1.1225  |  7.5134   |      nan       |     nan     | 136.3311 |        137.5095        |
|   timm_vision_transformer_large   |  8   | 2.2943  |  14.4897  |      nan       |   25.9253   | 117.3628 |        123.3241        |
|          pytorch_stargan          |  16  | 0.3849  |  2.4361   |     3.2232     |   4.0227    | 92.3776  |        102.2742        |
|          pytorch_struct           | 200  | 0.2404  |  0.8247   |     1.3498     |   4.1599    | 92.3395  |         73.867         |
|           BERT_pytorch            |  16  | 1.4589  |  7.7754   |      nan       |     nan     | 92.0246  |        93.1061         |
|           fastNLP_Bert            |  6   | 1.4803  |  7.0115   |    10.6166     |     nan     | 66.1196  |          64.1          |
|              hf_GPT2              |  4   | 1.2849  |  6.4734   |     9.581      |     nan     | 63.4356  |        62.7736         |
|              hf_Bart              |  4   | 1.4354  |  8.3888   |    12.4238     |     nan     | 50.8848  |        49.4684         |
|        mobilenet_v3_large         |  32  | 0.8558  |  5.0971   |     7.0187     |   53.4283   | 46.4169  |        46.0411         |
|            densenet121            |  4   | 2.0854  |  13.8001  |    21.0823     |   89.7884   |  45.582  |         43.524         |
|             hf_Albert             |  8   | 1.0142  |  5.9834   |     8.9882     |     nan     | 43.2885  |        40.8113         |
|            hf_BigBird             |  2   | 7.3116  |  13.8529  |    30.4412     |     nan     | 41.7364  |        26.9154         |
|              hf_Bert              |  4   | 1.3986  |  6.4943   |     9.2672     |     nan     | 40.2879  |         39.072         |
|            hf_Reformer            |  4   | 2.4077  |    nan    |     9.3638     |     nan     | 36.1775  |        34.6912         |
|            timm_regnet            |  32  | 2.2098  |  8.7135   |    21.5731     |   48.0162   | 35.5577  |        32.6271         |
|         timm_efficientnet         |  32  | 1.7224  |  6.9002   |    16.2455     |   53.1702   | 35.5396  |        33.1986         |
|            timm_nfnet             | 128  | 1.9232  |  7.9641   |      nan       |   30.4729   | 31.4204  |        29.4342         |
|           hf_DistilBert           |  8   | 0.4805  |  3.1646   |     5.9828     |     nan     | 30.6968  |        30.7352         |
|             resnet50              |  32  | 0.8401  |  5.1077   |     7.1067     |   32.6952   |  30.61   |        28.9547         |
|            timm_vovnet            |  32  | 1.4643  |  4.7686   |     10.593     |   23.8195   | 30.2593  |        27.0792         |
|          resnext50_32x4d          |  8   | 0.8621  |  5.0762   |     7.0093     |   28.8407   |  29.692  |        29.1772         |
|            mnasnet1_0             |  32  | 0.7748  |  4.7415   |     6.5712     |   31.1452   | 28.9074  |        28.2481         |
|       functorch_dp_cifar10        |  64  | 0.3534  |  2.0922   |     2.9489     |   5.6444    | 26.0975  |        25.5771         |
|             resnet18              |  16  |  0.399  |  1.9641   |     2.8601     |   17.4993   | 22.8969  |        21.9235         |
|        shufflenet_v2_x1_0         | 128  | 0.9024  |  5.6571   |     8.0459     |   27.0932   | 18.3878  |        17.7721         |
|            Super_SloMo            |  6   |  1.013  |   5.334   |     7.0228     |     nan     | 17.2736  |        16.7825         |
|        Background_Matting         |  4   | 0.7505  |  4.7428   |     6.8589     |   30.3629   | 17.0207  |         15.968         |
|           mobilenet_v2            |  96  | 0.7851  |  4.8185   |     6.9951     |   37.068    | 16.9112  |         16.395         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.379  |  2.3162   |     3.0731     |   3.9214    |  8.3507  |         8.0143         |
|           pytorch_unet            |  1   | 0.4315  |  2.2389   |     3.0989     |   19.953    |  8.3006  |         7.9089         |
|          LearningToPaint          |  96  | 0.4277  |  2.0612   |     2.9069     |   24.0299   |  7.2519  |         6.9993         |
|           squeezenet1_1           |  32  | 0.2226  |   1.006   |     1.4333     |    4.577    |  4.0949  |         3.7854         |
|      nvidia_deeprecommender       | 256  | 0.1907  |  0.4349   |     0.6856     |   2.4495    |  4.0058  |          3.71          |
|                drq                |  1   | 0.1368  |  0.4539   |     0.7737     |   3.5104    |  3.8337  |         3.2633         |
|               vgg16               |  64  | 0.1795  |  0.6625   |     1.0525     |    2.493    |  3.5342  |         3.3473         |
|         soft_actor_critic         | 256  | 0.1968  |  0.3518   |     0.5498     |   1.5108    |  3.4266  |         2.7146         |
|              alexnet              | 128  | 0.1445  |  0.4185   |     0.6973     |   2.3862    |  2.9512  |         2.6905         |
|               dcgan               |  32  | 0.1677  |  0.4586   |     0.6723     |    3.767    |  2.6567  |         2.4781         |
|           lennard_jones           | 1000 | 0.1388  |   0.292   |     0.4421     |   1.0721    |  1.9645  |         1.7635         |
|            tts_angular            |  64  | 0.2095  |  0.2684   |     0.4085     |   0.9997    |  1.8827  |         1.6681         |
|              demucs               |  4   | 0.3029  |  0.2973   |     0.3102     |   0.3061    |  0.2077  |         0.2076         |
|           hf_GPT2_large           |  4   | 4.9466  |  19.8968  |      nan       |     nan     |   nan    |        142.6081        |
|               hf_T5               |  8   |  2.055  |  9.4609   |      nan       |     nan     |   nan    |        44.7663         |
|               dlrm                | 2048 |   nan   |  0.8311   |      nan       |     nan     |   nan    |         2.9874         |
|           hf_Longformer           |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|    mobilenet_v2_quantized_qat     |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|      resnet50_quantized_qat       |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|         timm_efficientnet         |  32  | 0.9937 |  0.7666   |     0.2636     |   0.7837    |  1.3107  |         1.3377         |
|            Super_SloMo            |  6   | 1.0024 |  0.9525   |     0.363      |     nan     |  1.1857  |         1.1913         |
|         timm_efficientdet         |  1   | 1.0111 |   0.823   |      nan       |     nan     |  1.1165  |         1.1428         |
|           mobilenet_v2            |  96  | 0.9928 |  0.7624   |     0.3062     |   0.7638    |  1.1005  |         1.1105         |
|           squeezenet1_1           |  32  | 0.9749 |  0.8159   |     0.3374     |   0.9742    |  1.0823  |         1.1267         |
|            timm_nfnet             | 128  | 0.9358 |  0.8936   |      nan       |   0.9478    |  1.0219  |         1.0495         |
|              demucs               |  4   | 0.9886 |  0.9886   |     0.9886     |   0.9886    |  0.9886  |         0.9886         |
|            tts_angular            |  64  | 0.9884 |  0.9884   |     0.9829     |   0.9884    |  0.983   |         0.9884         |
|        shufflenet_v2_x1_0         | 128  | 0.9739 |  0.8944   |      0.35      |   0.8662    |  0.9791  |         1.0072         |
|              hf_GPT2              |  4   | 0.9548 |   0.887   |     0.353      |     nan     |  0.9505  |         1.0819         |
|            timm_regnet            |  32  | 0.9985 |  0.8614   |     0.3327     |   0.8784    |  0.9284  |         0.9323         |
|        Background_Matting         |  4   | 0.9998 |  0.9492   |     0.3596     |   0.9749    |  0.9212  |         0.9238         |
|              yolov3               |  16  | 0.9957 |   0.844   |     0.334      |   0.8814    |  0.9151  |         0.919          |
|          pytorch_stargan          |  16  | 0.9975 |  1.0179   |     0.4129     |   1.0085    |  0.9023  |         0.9928         |
|           timm_resnest            |  32  | 0.9935 |   0.88    |     0.3236     |   0.8024    |  0.8982  |         0.9697         |
|        speech_transformer         |  32  | 0.9982 |  0.9159   |      nan       |     nan     |  0.8959  |         0.8996         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9986 |  0.9149   |     0.3919     |   0.9141    |  0.8862  |         0.9646         |
|        mobilenet_v3_large         |  32  | 0.9878 |  0.8563   |     0.3278     |   0.8681    |  0.8829  |         0.8964         |
|             hf_Albert             |  8   | 0.9333 |  0.9333   |     0.2822     |     nan     |  0.8804  |         1.1942         |
|            hf_T5_large            |  2   | 0.922  |  0.8722   |      nan       |     nan     |  0.8737  |         0.922          |
|           pytorch_unet            |  1   | 0.9985 |  0.8521   |     0.3441     |   0.8496    |  0.859   |         0.8608         |
|            densenet121            |  4   | 0.9904 |  0.8812   |     0.3437     |   0.8551    |  0.857   |         0.9307         |
|             resnet50              |  32  | 0.9942 |  0.8719   |     0.3368     |    0.797    |  0.8564  |         0.8913         |
|              hf_Bert              |  4   | 0.9683 |  0.8952   |     0.3395     |     nan     |  0.8564  |         0.9017         |
|              hf_Bart              |  4   | 0.9618 |   0.879   |     0.3245     |     nan     |  0.8531  |         1.0964         |
|            mnasnet1_0             |  32  | 0.9869 |  0.8985   |     0.3331     |   0.8263    |  0.8531  |         0.8659         |
|           fastNLP_Bert            |  6   | 1.0011 |  0.9152   |     0.3384     |     nan     |  0.8343  |         1.0755         |
|          resnext50_32x4d          |  8   | 0.9954 |  0.8671   |     0.3595     |   0.8203    |  0.8303  |         0.8352         |
|   timm_vision_transformer_large   |  8   | 0.9997 |  0.8415   |      nan       |    0.801    |  0.8286  |         0.9823         |
|           BERT_pytorch            |  16  |  1.0   |  0.8995   |      nan       |     nan     |  0.825   |         1.0689         |
|            hf_BigBird             |  2   | 0.9604 |  0.9604   |     0.4303     |     nan     |  0.8205  |         1.0404         |
| attention_is_all_you_need_pytorch | 256  | 0.9476 |  0.9243   |      nan       |     nan     |  0.816   |         0.9432         |
|           hf_DistilBert           |  8   | 0.9211 |  0.9047   |     0.2988     |     nan     |  0.7841  |         0.8605         |
|               dcgan               |  32  | 0.9754 |  0.7634   |     0.4581     |   0.7634    |  0.767   |         0.7903         |
|                drq                |  1   | 0.987  |  0.8777   |     0.4252     |   0.8772    |  0.7632  |         0.8778         |
|         soft_actor_critic         | 256  | 0.9997 |  0.9637   |     0.4355     |   0.9555    |   0.75   |         0.9991         |
|              alexnet              | 128  | 0.9542 |   0.745   |     0.4163     |   0.7455    |  0.743   |         0.8332         |
|            timm_vovnet            |  32  | 0.9933 |  0.7603   |     0.3201     |   0.7741    |  0.7286  |         0.7339         |
|          LearningToPaint          |  96  | 0.9442 |   0.716   |     0.3383     |   0.6272    |  0.7133  |         0.7462         |
|      timm_vision_transformer      |  8   | 0.9943 |  0.8835   |     0.3307     |   0.8104    |  0.712   |         0.7779         |
|             resnet18              |  16  | 0.9831 |  0.7792   |     0.3589     |   0.6971    |  0.6902  |         0.7049         |
|               vgg16               |  64  | 0.9944 |  0.6638   |     0.3214     |   0.6639    |  0.6471  |         0.6497         |
|           lennard_jones           | 1000 | 0.9995 |  0.9995   |     0.3711     |   1.0947    |  0.5646  |         0.9989         |
|      nvidia_deeprecommender       | 256  | 0.5598 |  0.5598   |     0.4624     |   0.5598    |  0.5598  |         0.5598         |
|          pytorch_struct           | 200  |  1.0   |  0.5079   |     0.4824     |   0.5079    |  0.4222  |         0.429          |
|       functorch_dp_cifar10        |  64  | 0.9961 |  0.8224   |     0.4456     |   0.8227    |  0.4056  |         0.4212         |
|            hf_Reformer            |  4   | 0.3011 |    nan    |     0.2397     |     nan     |  0.299   |         0.9882         |
|               hf_T5               |  8   | 0.9527 |  0.9445   |      nan       |     nan     |   nan    |         1.1507         |
|           hf_GPT2_large           |  4   | 0.936  |  0.8768   |      nan       |     nan     |   nan    |         1.0941         |
|               dlrm                | 2048 |  nan   |  0.7305   |      nan       |     nan     |   nan    |         0.7306         |
|           hf_Longformer           |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|    mobilenet_v2_quantized_qat     |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|      resnet50_quantized_qat       |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|            YituTechConvBert             |  1  | 1.0322 |  0.9286   |      0.0       |     0.0     |  3.2857  |         1.4837         |
|       MT5ForConditionalGeneration       |  8  | 1.0241 |   0.911   |      0.0       |     0.0     |  2.6289  |         1.9706         |
|          MobileBertForMaskedLM          | 32  | 1.0238 |  0.8967   |      0.0       |     0.0     |  2.5667  |         1.567          |
|                CamemBert                |  1  | 1.0522 |  0.9506   |     1.3222     |     0.0     |  2.3526  |         1.5538         |
|               GoogleFnet                |  1  | 0.9976 |  0.8053   |     0.9757     |   1.1197    |  2.1236  |         1.1053         |
|               DistillGPT2               |  1  | 1.0358 |  0.9382   |     1.0302     |     0.0     |  2.0234  |         1.8859         |
|      GPT2ForSequenceClassification      |  4  | 1.0005 |   0.971   |      0.0       |     0.0     |  1.6716  |         1.6623         |
|     M2M100ForConditionalGeneration      |  8  | 1.1038 |  1.0651   |     0.9719     |     0.0     |  1.5317  |         1.3122         |
|       T5ForConditionalGeneration        |  4  | 1.0005 |  0.9631   |      0.0       |     0.0     |  1.4354  |         1.427          |
|     MobileBertForQuestionAnswering      | 64  | 1.0235 |  0.8892   |      0.0       |     0.0     |  1.4293  |         1.2814         |
|                 T5Small                 |  1  | 1.0256 |  0.9437   |      0.0       |     0.0     |  1.4051  |         1.1741         |
|           ElectraForCausalLM            | 32  | 1.0009 |  0.9321   |      0.0       |     0.0     |  1.3666  |         1.4021         |
|       ElectraForQuestionAnswering       | 64  | 1.0001 |  0.9859   |      0.0       |     0.0     |  1.3611  |         1.3419         |
|     PLBartForConditionalGeneration      | 16  | 1.016  |  0.8877   |     0.7834     |     0.0     |  1.311   |         1.2039         |
|       AlbertForQuestionAnswering        |  4  | 1.0003 |   1.002   |      0.0       |     0.0     |  1.2669  |         1.2601         |
|            AlbertForMaskedLM            |  4  | 1.0002 |  0.9999   |      0.0       |     0.0     |  1.2626  |         1.256          |
|             XGLMForCausalLM             |  8  | 1.0119 |  0.9414   |      0.0       |     0.0     |  1.2519  |         1.1754         |
|    LayoutLMForSequenceClassification    | 16  | 1.0002 |  0.9897   |     0.7379     |     0.0     |  1.2491  |         1.2388         |
|             OPTForCausalLM              | 32  | 1.0028 |   0.918   |     0.6969     |     0.0     |  1.1786  |          1.2           |
|           LayoutLMForMaskedLM           | 16  | 1.0003 |  0.9642   |      0.0       |     0.0     |  1.1709  |         1.1757         |
|     DistilBertForQuestionAnswering      | 64  |  1.0   |  0.9862   |     0.7137     |     0.0     |  1.1482  |         1.1336         |
|           RobertaForCausalLM            | 64  | 1.0004 |  0.9547   |     0.7336     |     0.0     |  1.1231  |         1.1303         |
|    MegatronBertForQuestionAnswering     | 16  | 1.0402 |  0.9223   |     0.7571     |     0.0     |  1.1148  |         1.0728         |
|         Speech2Text2ForCausalLM         | 128 | 0.9986 |  0.9285   |     0.6608     |     0.0     |  1.1048  |         1.144          |
|      BartForConditionalGeneration       |  2  | 1.0005 |  0.9872   |      0.0       |     0.0     |  1.1031  |         1.0956         |
|             BartForCausalLM             |  4  | 1.0008 |  0.9685   |     0.7402     |     0.0     |   1.1    |         1.1091         |
|      MBartForConditionalGeneration      | 16  | 1.0401 |  0.9843   |     0.7543     |     0.0     |  1.098   |         1.0901         |
|                 BigBird                 |  1  | 0.9925 |  0.9408   |     1.0057     |     0.0     |  1.0969  |         1.0001         |
|         MegatronBertForCausalLM         | 16  | 1.0338 |  0.9888   |     0.7351     |     0.0     |  1.0925  |         1.079          |
|           DebertaForMaskedLM            |  4  | 0.932  |  0.8128   |     0.7366     |     0.0     |  1.0855  |         1.0664         |
|       RobertaForQuestionAnswering       | 128 |  1.0   |  0.9869   |      0.0       |     0.0     |  1.0847  |         1.0719         |
|        BertForQuestionAnswering         | 128 | 1.0001 |  0.9941   |      0.0       |     0.0     |  1.0845  |         1.0723         |
|     PegasusForConditionalGeneration     | 16  | 1.0116 |  0.9852   |     0.7616     |     0.0     |  1.0833  |         1.0789         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0007 |  0.9394   |      0.0       |     0.0     |  1.0681  |         1.0753         |
|       DebertaForQuestionAnswering       |  8  | 0.9974 |  0.9935   |     0.6831     |     0.0     |  1.0623  |         1.2018         |
|          DistilBertForMaskedLM          | 64  | 1.0006 |  0.9517   |     0.6893     |     0.0     |  1.0416  |         1.0592         |
|             BertForMaskedLM             | 64  | 1.0002 |   0.962   |     0.7175     |     0.0     |  1.0372  |         1.0413         |
|            PLBartForCausalLM            | 32  | 1.0048 |  0.9226   |     0.6882     |     0.0     |  1.0224  |         1.0484         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0016 |  0.9115   |     0.653      |     0.0     |  1.0059  |         1.0432         |
|            TrOCRForCausalLM             | 32  | 1.0012 |   0.958   |      0.0       |     0.0     |  1.0014  |         1.0132         |
|            MBartForCausalLM             | 32  | 1.0008 |   0.958   |     0.7186     |     0.0     |  0.9988  |         1.0092         |
|           PegasusForCausalLM            | 32  | 0.9997 |  0.9548   |     0.732      |     0.0     |  0.9911  |         1.0041         |
|          AllenaiLongformerBase          |  0  |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+
|                  name                   | bs |    eager    |  aot_eager  | aot_cudagraphs | aot_nvfuser |  inductor   | inductor_no_cudagraphs |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+
|               GoogleFnet                | 1  |    pass     |    pass     |      pass      |    pass     |    pass     |          pass          |
|      GPT2ForSequenceClassification      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|           RobertaForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       RobertaForQuestionAnswering       | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|         Speech2Text2ForCausalLM         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            TrOCRForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            AlbertForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       AlbertForQuestionAnswering        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      BartForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       DebertaForQuestionAnswering       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       MT5ForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|           PegasusForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|          MobileBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|     MobileBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       T5ForConditionalGeneration        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|                 T5Small                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             XGLMForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            XLNetLMHeadModel             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            YituTechConvBert             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|           DebertaForMaskedLM            | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |          pass          |
|             BartForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     PegasusForConditionalGeneration     | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            PLBartForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|               DistillGPT2               | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             BertForMaskedLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|        BertForQuestionAnswering         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|                 BigBird                 | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|                CamemBert                | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|          DistilBertForMaskedLM          | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             OPTForCausalLM              | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     DistilBertForQuestionAnswering      | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           ElectraForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       ElectraForQuestionAnswering       | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           LayoutLMForMaskedLM           | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|    LayoutLMForSequenceClassification    | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     M2M100ForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            MBartForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|         MegatronBertForCausalLM         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     PLBartForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run | fail_to_run |      fail_to_run       |
|      MBartForConditionalGeneration      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
|          AllenaiLongformerBase          | 1  | fail_to_run | fail_to_run |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|             XGLMForCausalLM             |  8  | 2.2985 |  12.6763  |      nan       |     nan     | 203.6891 |        200.2767        |
|       DebertaForQuestionAnswering       |  8  | 4.5403 |  11.4247  |    46.4595     |     nan     | 171.5092 |        102.4203        |
|           DebertaForMaskedLM            |  4  | 4.6123 |  11.5887  |    46.0078     |     nan     | 171.2419 |        104.2544        |
|     M2M100ForConditionalGeneration      |  8  | 2.6023 |  12.9551  |    22.1161     |     nan     | 126.5533 |        131.4465        |
|            YituTechConvBert             |  1  | 2.1207 |  10.3087  |      nan       |     nan     | 119.0593 |        116.0233        |
|          MobileBertForMaskedLM          | 32  | 7.8978 |  28.8989  |      nan       |     nan     | 90.7721  |        87.9723         |
|       MT5ForConditionalGeneration       |  8  | 3.2861 |  14.5828  |      nan       |     nan     | 89.6576  |        90.9357         |
|     MobileBertForQuestionAnswering      | 64  | 7.9453 |  29.0084  |      nan       |     nan     | 75.2123  |        72.6774         |
|         MegatronBertForCausalLM         | 16  | 3.1817 |  13.6768  |    20.1344     |     nan     | 62.2739  |        60.2614         |
|    MegatronBertForQuestionAnswering     | 16  | 3.1327 |  13.4135  |    19.7731     |     nan     | 60.6824  |        58.3612         |
|    LayoutLMForSequenceClassification    | 16  | 1.5798 |  6.8793   |    10.3152     |     nan     | 60.5796  |        57.4539         |
|       T5ForConditionalGeneration        |  4  | 2.0135 |  9.3825   |      nan       |     nan     | 59.3736  |        58.6913         |
|     PegasusForConditionalGeneration     | 16  | 2.6479 |  15.5826  |    25.1173     |     nan     |  58.188  |        54.1699         |
|      BartForConditionalGeneration       |  2  | 2.8818 |  15.9096  |      nan       |     nan     | 57.0617  |        55.1758         |
|      MBartForConditionalGeneration      | 16  | 2.8784 |  15.9804  |    25.3712     |     nan     | 54.1345  |        52.6786         |
|                 T5Small                 |  1  | 2.015  |  9.3876   |      nan       |     nan     | 54.0177  |        54.2117         |
|     PLBartForConditionalGeneration      | 16  | 1.3858 |  8.4331   |    11.9259     |     nan     | 48.2602  |        46.3276         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.7395 |  10.4133  |      nan       |     nan     | 43.3567  |        42.0299         |
|                 BigBird                 |  1  | 7.2389 |  13.8123  |    30.1154     |     nan     | 41.5957  |        26.8249         |
|           ElectraForCausalLM            | 32  | 1.3496 |  6.5873   |      nan       |     nan     | 40.8107  |        39.8316         |
|               DistillGPT2               |  1  | 0.6566 |  3.2629   |     4.4575     |     nan     | 34.8336  |         33.828         |
|           LayoutLMForMaskedLM           | 16  | 1.4746 |  6.8259   |      nan       |     nan     | 32.6309  |        31.9015         |
|             BertForMaskedLM             | 64  | 1.3679 |  6.9199   |     9.7535     |     nan     | 32.5267  |        32.2817         |
|       ElectraForQuestionAnswering       | 64  | 1.4289 |  6.5969   |      nan       |     nan     | 32.2964  |         31.472         |
|      GPT2ForSequenceClassification      |  4  | 1.3167 |  6.4634   |      nan       |     nan     | 31.0139  |        30.7384         |
|           RobertaForCausalLM            | 64  | 1.367  |  6.6562   |     9.9558     |     nan     | 28.5936  |        27.7608         |
|        BertForQuestionAnswering         | 128 | 1.3766 |  6.5759   |      nan       |     nan     | 27.7677  |         27.211         |
|           PegasusForCausalLM            | 32  | 1.035  |  6.0746   |     9.4917     |     nan     | 27.1192  |        25.3485         |
|            MBartForCausalLM             | 32  | 1.0118 |  5.9265   |     8.9121     |     nan     | 25.3271  |        24.2305         |
|            TrOCRForCausalLM             | 32  | 0.9907 |  5.9939   |      nan       |     nan     | 24.5949  |         23.713         |
|             BartForCausalLM             |  4  | 1.0684 |  5.9475   |     9.1817     |     nan     | 24.4743  |        22.8046         |
|       RobertaForQuestionAnswering       | 128 | 1.4283 |  6.7034   |      nan       |     nan     | 24.4342  |        23.8045         |
|            AlbertForMaskedLM            |  4  | 1.1046 |  6.1904   |      nan       |     nan     | 23.6455  |         22.665         |
|               GoogleFnet                |  1  | 0.802  |  3.5063   |    10.9843     |   9.8353    | 23.5824  |        16.0928         |
|       BlenderbotSmallForCausalLM        | 64  | 0.6434 |  4.0765   |     6.1732     |     nan     |  23.252  |        22.4689         |
|          DistilBertForMaskedLM          | 64  | 0.5302 |  3.1918   |     6.3091     |     nan     | 23.0614  |        22.7622         |
|       AlbertForQuestionAnswering        |  4  | 1.1145 |  6.1583   |      nan       |     nan     | 22.4725  |        21.4997         |
|     DistilBertForQuestionAnswering      | 64  | 0.5133 |  3.1543   |     6.4246     |     nan     | 22.1165  |         21.87          |
|             OPTForCausalLM              | 32  | 1.0542 |  6.3196   |    14.0811     |     nan     | 21.8721  |        21.0082         |
|                CamemBert                |  1  | 1.4438 |  6.5437   |     9.3165     |     nan     | 21.6547  |        21.1919         |
|         Speech2Text2ForCausalLM         | 128 | 0.5788 |  3.1117   |     4.8042     |     nan     | 19.7945  |        17.9931         |
|            PLBartForCausalLM            | 32  | 0.4783 |  3.1633   |     4.5835     |     nan     | 18.8242  |        18.1771         |
|          AllenaiLongformerBase          |  0  |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|      GPT2ForSequenceClassification      |  4  | 0.9343 |  0.9093   |      nan       |     nan     |  1.0318  |         1.0911         |
|        BertForQuestionAnswering         | 128 |  1.0   |   0.968   |      nan       |     nan     |  0.9489  |         1.0035         |
|       RobertaForQuestionAnswering       | 128 |  1.0   |   0.968   |      nan       |     nan     |  0.9489  |         1.0035         |
|       ElectraForQuestionAnswering       | 64  |  1.0   |  0.9524   |      nan       |     nan     |  0.9361  |         1.0025         |
|    LayoutLMForSequenceClassification    | 16  |  1.0   |  0.9348   |     0.3324     |     nan     |  0.9339  |         0.9827         |
|     DistilBertForQuestionAnswering      | 64  |  1.0   |  0.9373   |     0.3178     |     nan     |  0.8896  |         0.9987         |
|           LayoutLMForMaskedLM           | 16  |  1.0   |  0.9409   |      nan       |     nan     |  0.8698  |         0.9409         |
|           PegasusForCausalLM            | 32  | 0.9593 |  0.8885   |     0.3909     |     nan     |  0.8602  |         0.8971         |
|                 T5Small                 |  1  |  1.0   |  0.9325   |      nan       |     nan     |  0.8564  |         1.0758         |
|     PegasusForConditionalGeneration     | 16  | 0.9985 |  0.9643   |     0.3704     |     nan     |  0.8446  |         0.9753         |
|             BartForCausalLM             |  4  |  1.0   |  0.9121   |     0.3405     |     nan     |  0.8438  |         0.9191         |
|    MegatronBertForQuestionAnswering     | 16  |  1.0   |  0.8671   |     0.3483     |     nan     |  0.8428  |         0.9785         |
|         MegatronBertForCausalLM         | 16  | 0.9995 |  0.8734   |     0.3426     |     nan     |  0.8412  |         0.9604         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.9425   |      nan       |     nan     |  0.841   |         1.3637         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.9038   |      nan       |     nan     |   0.84   |         0.9815         |
|             BertForMaskedLM             | 64  |  1.0   |  0.9219   |     0.3433     |     nan     |  0.8321  |         0.922          |
|           RobertaForCausalLM            | 64  | 0.9986 |  0.9206   |     0.3429     |     nan     |  0.8309  |         0.9212         |
|                 BigBird                 |  1  | 0.999  |  0.9542   |     0.4213     |     nan     |  0.822   |         1.0115         |
|       T5ForConditionalGeneration        |  4  |  1.0   |  0.9597   |      nan       |     nan     |  0.8215  |         1.1049         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.9255   |      nan       |     nan     |  0.8201  |         1.3395         |
|               DistillGPT2               |  1  | 0.9984 |  0.7704   |     0.3572     |     nan     |  0.8184  |          0.93          |
|                CamemBert                |  1  | 0.998  |  0.7977   |     0.3507     |     nan     |  0.8088  |         0.8656         |
|            MBartForCausalLM             | 32  | 0.9999 |   0.89    |     0.3428     |     nan     |  0.8083  |         0.8986         |
|            TrOCRForCausalLM             | 32  | 0.9999 |  0.8898   |      nan       |     nan     |  0.8079  |         0.8984         |
|             XGLMForCausalLM             |  8  | 0.9848 |  0.9267   |      nan       |     nan     |  0.8058  |         0.9504         |
|            YituTechConvBert             |  1  | 0.9858 |  0.7923   |      nan       |     nan     |  0.8025  |         0.8667         |
|      MBartForConditionalGeneration      | 16  |  1.0   |  0.8721   |     0.3374     |     nan     |  0.798   |         0.9514         |
|             OPTForCausalLM              | 32  | 0.9982 |  0.8655   |     0.3276     |     nan     |  0.7952  |         0.9067         |
|     PLBartForConditionalGeneration      | 16  |  1.0   |  0.8964   |     0.3314     |     nan     |  0.7861  |         0.9514         |
|           ElectraForCausalLM            | 32  | 0.9994 |   0.883   |      nan       |     nan     |  0.7793  |         0.8833         |
|            PLBartForCausalLM            | 32  | 0.9999 |   0.861   |     0.3557     |     nan     |  0.7739  |         0.8854         |
|         Speech2Text2ForCausalLM         | 128 | 0.9552 |   0.842   |     0.3524     |     nan     |  0.7727  |         0.8857         |
|          DistilBertForMaskedLM          | 64  |  1.0   |  0.8899   |     0.3394     |     nan     |  0.7724  |         0.8899         |
|               GoogleFnet                |  1  | 0.9983 |  0.9453   |     0.3715     |   1.0813    |  0.7687  |         0.9366         |
|       MT5ForConditionalGeneration       |  8  | 1.0034 |  0.8861   |      nan       |     nan     |  0.7623  |         0.9396         |
|     M2M100ForConditionalGeneration      |  8  | 1.0004 |  0.9685   |     0.4048     |     nan     |  0.755   |         0.9848         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8975   |      nan       |     nan     |  0.7528  |         0.9074         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8401   |     0.3578     |     nan     |  0.7277  |         0.8452         |
|          MobileBertForMaskedLM          | 32  | 0.9998 |  0.9103   |      nan       |     nan     |  0.5256  |         0.7111         |
|     MobileBertForQuestionAnswering      | 64  |  1.0   |   0.984   |      nan       |     nan     |  0.4536  |         0.5968         |
|           DebertaForMaskedLM            |  4  |  1.0   |  0.9851   |     0.3554     |     nan     |  0.4265  |         1.0346         |
|       DebertaForQuestionAnswering       |  8  | 0.9816 |   1.063   |     0.3072     |     nan     |  0.3264  |         1.1588         |
|          AllenaiLongformerBase          |  0  |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|          ghostnet_100           | 128 | 0.9992 |  0.9952   |     0.8323     |   1.2494    |  1.8151  |         1.7763         |
|            lcnet_050            | 128 | 0.9559 |  0.9497   |     0.7507     |   1.5008    |  1.6591  |         1.6342         |
|         coat_lite_mini          | 128 | 0.9999 |   0.998   |     0.8425     |   1.0567    |  1.6343  |         1.607          |
|        tnt_s_patch16_224        | 128 | 0.9998 |  0.9991   |      0.0       |   1.6291    |  1.5537  |         1.5498         |
|           regnety_002           | 128 | 0.976  |  0.9853   |     0.8505     |   1.3458    |  1.5378  |         1.3541         |
|           dm_nfnet_f0           | 128 | 0.9993 |  1.0002   |      0.0       |   1.2113    |  1.4732  |         1.4221         |
|      xcit_large_24_p8_224       |  5  | 1.0027 |  0.9974   |      0.0       |     0.0     |  1.4616  |         1.4159         |
|            hrnet_w18            | 128 | 0.9999 |  0.9986   |      0.0       |    1.32     |  1.4201  |         1.3797         |
|           volo_d1_224           | 64  | 0.9999 |  0.9961   |      0.0       |   1.1322    |  1.3882  |         1.3669         |
|             dla102              | 128 | 0.9997 |  1.0009   |      0.0       |   1.2858    |  1.3843  |         1.3695         |
|            nfnet_l0             | 128 | 0.9997 |  0.7892   |      0.0       |   1.0551    |  1.3763  |         1.3278         |
|        res2net50_14w_8s         | 128 | 0.9998 |  0.9998   |      0.0       |   1.2272    |  1.3584  |         1.3263         |
|         crossvit_9_240          | 128 | 0.9996 |  0.9994   |      0.0       |   1.0253    |  1.3373  |         1.3113         |
|      mobilenetv3_large_100      | 128 | 0.9655 |  0.9602   |     0.7644     |    1.169    |  1.3354  |         1.3423         |
|         mobilenetv2_100         | 128 | 0.9652 |   0.964   |     0.7064     |   1.0159    |  1.3344  |         1.3503         |
|       gluon_inception_v3        | 128 |  1.0   |  0.9987   |      0.0       |   1.1259    |  1.3282  |         1.3071         |
|        adv_inception_v3         | 128 | 0.9999 |  0.9993   |      0.0       |   1.1264    |  1.3276  |          1.31          |
|          inception_v3           | 128 |  1.0   |  0.9988   |      0.0       |   1.1258    |  1.3259  |         1.3087         |
|           res2next50            | 128 |  1.0   |   1.001   |      0.0       |   1.1668    |  1.3135  |         1.2754         |
|           resnest101e           | 64  | 0.9998 |  1.0032   |      0.0       |   1.1972    |  1.3124  |         1.2723         |
|            fbnetv3_b            | 128 | 0.9649 |   0.959   |     0.7493     |    1.135    |  1.2824  |         1.2923         |
|          gmixer_24_224          | 128 | 0.9998 |  0.8347   |      0.0       |    0.98     |  1.2817  |         1.2725         |
|          jx_nest_base           | 32  | 0.9999 |  0.9953   |      0.0       |   1.2171    |  1.2769  |         1.2512         |
|          botnet26t_256          | 128 | 0.9857 |  0.9855   |     0.7904     |   1.2275    |  1.2686  |         1.2808         |
|           mnasnet_100           | 128 | 0.9662 |  0.9638   |     0.7858     |   1.1594    |  1.268   |         1.2825         |
|           selecsls42b           | 128 |  1.0   |  0.9993   |     0.8142     |   1.2101    |  1.2652  |         1.2528         |
|        sebotnet33ts_256         | 64  | 0.9755 |  0.8077   |      0.0       |   1.0536    |  1.2648  |         1.2709         |
|       eca_botnext26ts_256       | 128 | 0.9868 |  0.7726   |      0.0       |   1.0303    |  1.2634  |         1.2491         |
|        eca_halonext26ts         | 128 | 0.9872 |  0.7789   |      0.0       |    1.03     |  1.2616  |         1.241          |
|       tf_efficientnet_b0        | 128 | 0.9768 |   0.784   |      0.0       |   0.9856    |  1.258   |         1.2647         |
|           convit_base           | 64  | 0.9997 |   0.999   |      0.0       |   1.1951    |  1.2566  |         1.2326         |
|           fbnetc_100            | 128 | 0.9665 |  0.9633   |     0.7791     |   1.1894    |  1.2472  |         1.2663         |
|        ese_vovnet19b_dw         | 128 | 0.9791 |   0.978   |     0.7438     |   1.1463    |  1.2445  |         1.2496         |
|          spnasnet_100           | 128 | 0.9616 |  0.9576   |     0.7728     |   1.1379    |  1.2351  |         1.2553         |
|          cspdarknet53           | 64  | 0.9578 |  0.9533   |     0.7372     |   1.1844    |  1.229   |         1.2348         |
|            pit_b_224            | 64  |  1.0   |  0.9997   |      0.0       |   1.0554    |  1.2276  |         1.2162         |
|        res2net101_26w_4s        | 64  | 1.0003 |   0.998   |     0.7702     |   1.1754    |  1.2263  |         1.1907         |
|          gmlp_s16_224           | 128 | 0.9999 |   0.999   |      0.0       |   0.9988    |  1.2237  |         1.213          |
|           rexnet_100            | 128 | 0.9726 |  0.8172   |      0.0       |   0.9838    |  1.2143  |         1.2197         |
|          pnasnet5large          | 16  | 0.9998 |  0.9985   |      0.0       |    1.084    |  1.2098  |         1.1938         |
|            tinynet_a            | 128 | 0.9665 |  0.7757   |     0.6203     |   0.9715    |  1.1896  |         1.2007         |
|             dpn107              | 32  | 0.9582 |  0.9509   |     0.7794     |    1.029    |  1.1894  |         1.203          |
|          cait_m36_384           |  4  | 0.9998 |   1.026   |      0.0       |    1.01     |  1.1867  |         1.1621         |
|           mobilevit_s           | 64  | 0.9796 |  0.7621   |      0.0       |   0.9503    |  1.173   |          1.17          |
|           tf_mixnet_l           | 128 | 0.9857 |  0.8902   |      0.0       |   1.0181    |  1.1711  |         1.1706         |
|            repvgg_a2            | 128 | 0.9636 |  0.9628   |     0.8262     |   1.1216    |  1.1698  |         1.1673         |
|         poolformer_m36          | 64  | 0.9998 |  0.9997   |      0.0       |     0.0     |  1.1664  |         1.1478         |
|            mixnet_l             | 128 | 0.9849 |   0.886   |      0.0       |   1.0186    |  1.1532  |         1.1532         |
|        twins_pcpvt_base         | 64  | 0.9998 |  0.9995   |     0.7488     |   1.0638    |  1.1525  |         1.1237         |
|          convnext_base          | 64  |  1.0   |  0.9987   |      0.0       |   1.0437    |  1.1466  |         1.1195         |
|  swin_base_patch4_window7_224   | 64  |  1.0   |  0.9791   |      0.0       |   0.9888    |  1.1417  |         1.1351         |
|      beit_base_patch16_224      | 64  | 0.9999 |  0.9813   |      0.0       |   0.9496    |  1.1189  |         1.1087         |
|     swsl_resnext101_32x16d      | 32  |  1.0   |  0.9995   |      0.0       |   1.1083    |  1.1092  |         1.0714         |
| deit_base_distilled_patch16_224 | 64  |  1.0   |  0.9992   |     0.7653     |   1.0117    |  1.1004  |         1.0909         |
|      vit_base_patch16_224       | 64  | 0.9999 |  0.9974   |     0.7675     |   0.9727    |  1.0936  |         1.0813         |
|        gluon_xception65         | 32  | 0.9997 |  0.9971   |      0.0       |   1.0382    |  1.0873  |         1.0759         |
|          mixer_b16_224          | 128 | 1.0002 |  1.0003   |      0.0       |   0.9764    |  1.0833  |         1.0742         |
|        convmixer_768_32         | 32  | 0.9999 |    1.0    |      0.0       |   1.0614    |  1.0775  |         1.0747         |
|            gernet_l             | 128 | 0.9742 |  0.9725   |     0.8233     |    1.098    |  1.0765  |         1.0717         |
|         visformer_small         | 128 | 0.9999 |  1.0029   |     0.7984     |   1.0208    |  1.0507  |         1.0175         |
|          resmlp_12_224          | 128 |  1.0   |  1.0007   |     0.6957     |     0.0     |  0.958   |         0.9617         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+
|        adv_inception_v3         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          botnet26t_256          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           selecsls42b           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            tinynet_a            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        twins_pcpvt_base         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         visformer_small         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          gmixer_24_224          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          gmlp_s16_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          jx_nest_base           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |          pass          |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|      xcit_large_24_p8_224       | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|        gluon_xception65         | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|         poolformer_m36          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|         coat_lite_mini          | 2  | pass  | fail_accuracy | fail_accuracy  | fail_accuracy |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |      pass      |     pass      |     pass      |     fail_accuracy      |
|        sebotnet33ts_256         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           rexnet_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           res2next50            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          cspdarknet53           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|             dla102              | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|             dpn107              | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        eca_halonext26ts         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            gernet_l             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          ghostnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            hrnet_w18            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          inception_v3           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            lcnet_050            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            mixnet_l             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         mobilenetv2_100         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           mobilevit_s           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            nfnet_l0             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            pit_b_224            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          pnasnet5large          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           regnety_002           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            fbnetv3_b            | 2  | pass  |     pass      |      pass      |     pass      | fail_accuracy |     fail_accuracy      |
|           resnest101e           | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|        twins_pcpvt_base         | 64  | 2.1064 |  13.6124  |    22.5332     |   44.5522   | 407.4272 |        421.8365        |
|         coat_lite_mini          | 128 | 1.0545 |  5.4638   |     8.5507     |   14.592    | 363.5371 |        364.8512        |
|           mobilevit_s           | 64  | 1.6462 |  7.5357   |      nan       |   42.5465   | 235.2127 |        233.1142        |
|        eca_halonext26ts         | 128 | 1.5109 |  5.8122   |      nan       |   55.3877   | 209.9798 |        206.4748        |
|        sebotnet33ts_256         | 64  | 1.6237 |  6.6615   |      nan       |   51.2499   | 190.7575 |        190.0027        |
|  swin_base_patch4_window7_224   | 64  | 2.5902 |  13.0683  |      nan       |   59.6144   | 176.5977 |        174.5962        |
|       eca_botnext26ts_256       | 128 | 1.3981 |  5.4191   |      nan       |   53.0178   | 176.4393 |        178.3558        |
|      xcit_large_24_p8_224       |  5  | 2.7334 |  18.1337  |      nan       |     nan     | 171.4747 |        166.5968        |
|          jx_nest_base           | 32  | 1.6866 |  10.041   |      nan       |   60.8033   | 154.3111 |        154.2772        |
|          cait_m36_384           |  4  | 2.7866 |  19.1316  |      nan       |   45.3486   | 133.9738 |        126.9168        |
|          convnext_base          | 64  | 1.2021 |  6.4828   |      nan       |   21.9482   | 131.8442 |        130.3441        |
|          botnet26t_256          | 128 | 1.4442 |  5.0458   |    10.5014     |   40.8301   | 107.3642 |        105.6294        |
|            hrnet_w18            | 128 | 5.9732 |  33.7365  |      nan       |  258.2899   | 107.2875 |        100.8767        |
|         crossvit_9_240          | 128 | 1.3991 |  8.4784   |      nan       |   27.2552   | 97.4491  |        95.8135         |
|           resnest101e           | 64  | 3.1166 |  17.9608  |      nan       |   79.3142   | 91.5494  |        88.0441         |
|          pnasnet5large          | 16  | 4.3749 |  24.205   |      nan       |  126.2209   | 87.8382  |        85.2509         |
|           volo_d1_224           | 64  | 1.2721 |  8.0443   |      nan       |   27.6318   | 84.7613  |        84.3336         |
|          gmlp_s16_224           | 128 | 0.9982 |  6.7098   |      nan       |   13.7666   | 72.2002  |        69.6827         |
|         visformer_small         | 128 | 0.9284 |  4.6798   |     6.5872     |   25.4051   | 71.6266  |        70.2003         |
|            pit_b_224            | 64  | 0.9623 |  5.1661   |      nan       |   12.8012   | 67.0609  |         65.441         |
|        res2net101_26w_4s        | 64  | 2.9596 |  18.2967  |    30.5674     |   82.3674   | 56.5342  |        53.1778         |
|        tnt_s_patch16_224        | 128 | 1.6619 |  11.0684  |      nan       |   23.6515   | 53.1301  |        49.8372         |
|          gmixer_24_224          | 128 | 1.0668 |  7.7269   |      nan       |   16.8853   | 52.4953  |        50.7642         |
|        res2net50_14w_8s         | 128 | 2.6497 |  16.4188  |      nan       |  102.5509   | 51.7915  |        49.2823         |
|           convit_base           | 64  | 1.0183 |  6.2959   |      nan       |   18.2815   | 51.3509  |        49.6135         |
|        gluon_xception65         | 32  | 1.8166 |  11.8212  |      nan       |   43.0492   | 49.0287  |        45.3635         |
|         poolformer_m36          | 64  | 1.9625 |  9.9587   |      nan       |     nan     | 47.2636  |        45.2596         |
|     swsl_resnext101_32x16d      | 32  | 1.656  |  10.4396  |      nan       |   40.025    | 43.1311  |        38.2389         |
|          resmlp_12_224          | 128 | 0.6509 |  2.9707   |     5.6933     |     nan     | 42.5741  |        41.8756         |
|             dpn107              | 32  | 3.9328 |  16.425   |    48.0118     |   76.8514   | 41.0088  |         38.065         |
|          mixer_b16_224          | 128 | 0.7583 |  3.4742   |      nan       |   11.027    | 37.1717  |        35.7533         |
|            fbnetv3_b            | 128 | 3.1021 |  11.6415  |    33.1049     |   76.603    | 36.7662  |        34.5131         |
|        convmixer_768_32         | 32  | 1.2336 |  6.8275   |      nan       |   14.0789   | 36.4106  |        32.8851         |
|      vit_base_patch16_224       | 64  | 0.8981 |  4.6713   |     6.7835     |   9.5716    | 35.9023  |         35.086         |
| deit_base_distilled_patch16_224 | 64  | 0.9625 |  4.5007   |     7.5873     |   11.1046   | 35.7683  |        35.0427         |
|       gluon_inception_v3        | 128 | 1.5559 |  9.4799   |      nan       |   67.7436   |  35.151  |        32.7369         |
|           tf_mixnet_l           | 128 | 5.6826 |  13.4479  |      nan       |   69.3986   | 34.9013  |        31.9321         |
|          inception_v3           | 128 | 1.5586 |   9.388   |      nan       |   67.7377   | 34.8894  |        32.4783         |
|        adv_inception_v3         | 128 | 1.5685 |  9.4095   |      nan       |   67.7037   |  34.327  |        32.4009         |
|            mixnet_l             | 128 | 5.3416 |  13.2173  |      nan       |   69.2362   | 34.0143  |        31.1931         |
|          ghostnet_100           | 128 | 2.7574 |  10.2258  |    15.3268     |   59.4558   | 32.8531  |        30.6823         |
|             dla102              | 128 | 1.7449 |  10.6583  |      nan       |   64.0186   | 32.7614  |        30.6239         |
|      beit_base_patch16_224      | 64  | 1.1397 |  5.7288   |      nan       |   14.5224   | 32.2892  |        30.7223         |
|           dm_nfnet_f0           | 128 | 2.0187 |  8.0742   |      nan       |   30.4353   | 31.5084  |        29.8829         |
|           res2next50            | 128 | 1.6204 |  9.4079   |      nan       |   67.6723   | 29.7379  |        28.0747         |
|           rexnet_100            | 128 | 1.8332 |  7.8668   |      nan       |   103.179   | 27.1038  |        25.8224         |
|            tinynet_a            | 128 | 2.0351 |  8.4806   |    21.0774     |   61.8917   |  26.207  |        24.6746         |
|       tf_efficientnet_b0        | 128 | 1.723  |  7.1564   |      nan       |   61.3195   | 23.8449  |        22.4291         |
|          cspdarknet53           | 64  | 2.2287 |  7.9743   |    21.2872     |   49.5633   | 23.7336  |        22.5082         |
|            nfnet_l0             | 128 | 1.7113 |   7.931   |      nan       |   27.7671   |  23.085  |        21.8651         |
|           fbnetc_100            | 128 | 1.9814 |  7.1516   |     19.703     |   45.4371   | 22.6077  |        20.9507         |
|          spnasnet_100           | 128 | 1.9581 |  7.0371   |    18.6708     |   43.6267   | 22.0811  |        21.3115         |
|      mobilenetv3_large_100      | 128 | 1.5117 |  5.8597   |    13.8582     |   64.5508   | 20.2785  |        19.5219         |
|         mobilenetv2_100         | 128 | 1.5571 |  5.6349   |    14.2346     |   37.8495   | 18.9511  |        18.3495         |
|           regnety_002           | 128 | 1.5431 |  6.1735   |     14.19      |   46.9476   |  18.872  |         17.277         |
|            gernet_l             | 128 | 1.9269 |  6.7529   |    17.0736     |   36.3963   | 18.5788  |        17.4833         |
|           mnasnet_100           | 128 | 1.569  |   5.69    |    14.5924     |   37.8397   | 18.5275  |        17.4759         |
|            repvgg_a2            | 128 | 1.9561 |  6.4737   |    16.3986     |   44.6516   |  18.288  |        17.3587         |
|           selecsls42b           | 128 | 0.8176 |  4.1823   |     6.2503     |   39.0148   | 16.5258  |        15.4033         |
|            lcnet_050            | 128 | 0.9903 |   3.751   |     7.8725     |   31.4621   | 13.3796  |         12.394         |
|        ese_vovnet19b_dw         | 128 | 0.9825 |  3.3443   |     7.1691     |   31.1424   | 13.0004  |        12.3977         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|          gmixer_24_224          | 128 | 0.9951 |  0.9716   |      nan       |   0.9859    |  1.4283  |         1.494          |
|            tinynet_a            | 128 | 0.9942 |  0.7796   |     0.2617     |   0.7823    |  1.351   |         1.3692         |
|            nfnet_l0             | 128 | 0.993  |  0.8272   |      nan       |   0.8084    |  1.2907  |         1.3392         |
|           rexnet_100            | 128 | 0.9935 |  0.7843   |      nan       |   0.8682    |  1.2619  |         1.2765         |
|       tf_efficientnet_b0        | 128 | 0.9935 |  0.7688   |      nan       |   0.8401    |  1.1889  |         1.199          |
|          pnasnet5large          | 16  | 1.069  |   1.011   |      nan       |   1.2062    |  1.1876  |         1.3282         |
|           mobilevit_s           | 64  | 0.9959 |  0.7668   |      nan       |   0.7405    |  1.1489  |         1.1957         |
|       eca_botnext26ts_256       | 128 | 0.9938 |  0.7675   |      nan       |   0.7612    |  1.1378  |         1.2076         |
|        eca_halonext26ts         | 128 | 0.9937 |  0.7687   |      nan       |   0.7643    |  1.1375  |         1.2068         |
|         mobilenetv2_100         | 128 | 0.9925 |  0.7621   |     0.3063     |   0.7635    |  1.1003  |         1.1104         |
|           convit_base           | 64  | 0.9977 |  0.8838   |      nan       |   0.9506    |  1.0957  |         1.2656         |
|          cait_m36_384           |  4  | 0.9994 |   0.934   |      nan       |   0.9562    |  1.0885  |         1.1416         |
|         poolformer_m36          | 64  | 0.998  |  0.9512   |      nan       |     nan     |  1.0527  |         1.069          |
|           dm_nfnet_f0           | 128 | 0.9358 |  0.8936   |      nan       |   0.9479    |  1.0218  |         1.0495         |
|           resnest101e           | 64  | 0.9971 |  0.9519   |      nan       |    0.95     |  0.9994  |         1.0025         |
|          ghostnet_100           | 128 | 0.9865 |  0.8768   |     0.3273     |   0.9345    |  0.9853  |         1.0102         |
|        convmixer_768_32         | 32  | 0.9986 |  0.9854   |      nan       |   0.9793    |  0.9836  |         0.9853         |
|           tf_mixnet_l           | 128 | 0.9953 |   0.857   |      nan       |   0.8574    |  0.9711  |         1.0812         |
|            fbnetv3_b            | 128 | 0.9932 |  0.7828   |     0.3095     |    0.784    |  0.9696  |         0.977          |
|          mixer_b16_224          | 128 | 0.9952 |  0.9661   |      nan       |   0.8571    |  0.9519  |         0.9937         |
|             dla102              | 128 | 0.9831 |   0.917   |      nan       |   0.9529    |  0.9496  |         0.9538         |
|          gmlp_s16_224           | 128 | 0.9959 |  0.9783   |      nan       |   0.9704    |  0.9385  |         0.944          |
|            hrnet_w18            | 128 | 0.9954 |  0.9252   |      nan       |   0.8649    |  0.9376  |         0.9419         |
|        gluon_xception65         | 32  | 0.9975 |  0.9365   |      nan       |   0.8982    |  0.9351  |         0.9376         |
|        tnt_s_patch16_224        | 128 | 0.996  |  0.9769   |      nan       |   0.8539    |  0.928   |         0.9992         |
|      beit_base_patch16_224      | 64  | 0.9966 |  0.9545   |      nan       |   0.8606    |  0.9272  |         0.982          |
|        res2net101_26w_4s        | 64  | 0.9968 |  0.9278   |     0.3243     |   0.8932    |  0.9269  |         0.9548         |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9434   |     0.3153     |   0.8229    |  0.915   |         0.9873         |
|           volo_d1_224           | 64  | 0.996  |  0.9213   |      nan       |   0.7472    |  0.9124  |         0.9172         |
|      xcit_large_24_p8_224       |  5  | 0.9981 |  0.9194   |      nan       |     nan     |  0.912   |         1.0039         |
|        ese_vovnet19b_dw         | 128 | 0.9923 |  0.8877   |     0.3261     |   0.9302    |  0.9095  |         0.9161         |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9442   |     0.3138     |   0.8242    |  0.9095  |         0.9831         |
|             dpn107              | 32  | 0.9985 |  0.9271   |     0.3392     |   0.8941    |  0.9058  |         0.956          |
|           res2next50            | 128 | 0.9951 |  0.9153   |      nan       |   0.8618    |  0.9051  |         0.9312         |
|          spnasnet_100           | 128 | 0.989  |  0.9109   |     0.3309     |   0.8412    |  0.9047  |         0.9157         |
|            mixnet_l             | 128 | 0.9951 |   0.845   |      nan       |   0.7911    |  0.9014  |         1.0067         |
|      mobilenetv3_large_100      | 128 | 0.9876 |  0.8589   |     0.3244     |   0.8745    |  0.9007  |         0.9126         |
|         visformer_small         | 128 | 0.9943 |  0.9381   |     0.3293     |   0.9475    |  0.9006  |         0.951          |
|           selecsls42b           | 128 | 0.9883 |  0.8896   |     0.337      |   0.8954    |  0.899   |         0.9192         |
|          inception_v3           | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|       gluon_inception_v3        | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|        adv_inception_v3         | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|           mnasnet_100           | 128 | 0.9877 |  0.9019   |     0.3306     |   0.8279    |  0.8961  |         0.9077         |
|        twins_pcpvt_base         | 64  | 0.9976 |  0.9195   |     0.3132     |   0.8403    |  0.896   |         0.9842         |
|     swsl_resnext101_32x16d      | 32  | 0.9991 |  0.8972   |      nan       |   0.8675    |  0.8932  |         0.9249         |
|            lcnet_050            | 128 | 0.9672 |  0.7521   |     0.3171     |   0.7524    |  0.8921  |         0.923          |
|          convnext_base          | 64  | 0.9975 |  0.9169   |      nan       |   0.7604    |  0.8902  |         0.9143         |
|          cspdarknet53           | 64  | 0.9954 |  0.8528   |     0.316      |   0.8762    |  0.8835  |         0.8875         |
|        res2net50_14w_8s         | 128 | 0.9952 |  0.9049   |      nan       |   0.8611    |  0.881   |         0.9327         |
|           regnety_002           | 128 | 0.9717 |  0.8104   |     0.3283     |   0.7599    |  0.8617  |         0.8993         |
|          botnet26t_256          | 128 | 0.9915 |  0.8434   |     0.3165     |    0.745    |  0.8605  |         0.8702         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9288   |      nan       |    0.83     |  0.8585  |         0.9871         |
|          jx_nest_base           | 32  | 1.0002 |  0.8966   |      nan       |   0.7112    |  0.8575  |         0.9714         |
|           fbnetc_100            | 128 | 0.9891 |  0.8518   |     0.3236     |   0.7446    |  0.8416  |         0.8498         |
|        sebotnet33ts_256         | 64  | 0.9952 |  0.7084   |      nan       |   0.6831    |  0.841   |         0.9711         |
|         crossvit_9_240          | 128 | 0.9884 |  0.8657   |      nan       |   0.7297    |  0.8274  |         0.9755         |
|          resmlp_12_224          | 128 | 0.9893 |   0.943   |     0.2472     |     nan     |  0.8169  |         0.8253         |
|         coat_lite_mini          | 128 | 1.0049 |  0.8777   |     0.3262     |   0.7873    |  0.7954  |         0.9838         |
|            gernet_l             | 128 | 0.9884 |  0.7892   |      0.32      |   0.7938    |  0.7928  |         0.8234         |
|            pit_b_224            | 64  | 0.9968 |  0.7947   |      nan       |   0.6417    |  0.792   |         0.9866         |
|            repvgg_a2            | 128 | 0.9867 |  0.8054   |     0.3277     |   0.6573    |  0.7684  |         0.8011         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_float32.png :

bench_logs/huggingface_float32.png :

bench_logs/torchbench_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 94%, 50/53 | 98%, 41/42  | 100%, 61/61 |
|       aot_eager        | 94%, 50/53 | 98%, 41/42  | 95%, 58/61  |
|     aot_cudagraphs     | 74%, 39/53 | 60%, 25/42  | 79%, 48/61  |
|      aot_nvfuser       | 60%, 32/53 |  0%, 0/42   | 80%, 49/61  |
|        inductor        | 85%, 45/53 | 93%, 39/42  | 93%, 57/61  |
| inductor_no_cudagraphs | 87%, 46/53 | 93%, 39/42  | 93%, 57/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.01x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|     aot_cudagraphs     |   1.21x    |    1.05x    |    1.00x    |
|      aot_nvfuser       |   1.17x    |    0.0x     |    1.19x    |
|        inductor        |   1.84x    |    1.76x    |    1.41x    |
| inductor_no_cudagraphs |   1.38x    |    1.54x    |    1.37x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    1.95    |    2.59     |    2.20     |
|       aot_eager        |    8.21    |    13.14    |    11.44    |
|     aot_cudagraphs     |    8.47    |    16.12    |    21.40    |
|      aot_nvfuser       |   27.30    |     0.0     |    72.96    |
|        inductor        |   59.25    |    62.06    |    90.07    |
| inductor_no_cudagraphs |   60.72    |    56.93    |    87.93    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.96x    |    0.99x    |    0.99x    |
|       aot_eager        |   0.85x    |    0.89x    |    0.87x    |
|     aot_cudagraphs     |   0.42x    |    0.38x    |    0.32x    |
|      aot_nvfuser       |   0.83x    |    0.0x     |    0.85x    |
|        inductor        |   0.83x    |    0.91x    |    0.95x    |
| inductor_no_cudagraphs |   0.93x    |    1.08x    |    1.01x    |
+------------------------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|            densenet121            |  4   | 1.0007 |  0.9055   |     2.4052     |   1.3965    |  5.7672  |         1.3388         |
|       functorch_dp_cifar10        |  64  | 0.9988 |  0.9074   |     2.3977     |   1.1956    |  4.9229  |         1.3976         |
|         timm_efficientdet         |  1   | 0.9862 |  0.8034   |      0.0       |     0.0     |  4.6793  |         1.5596         |
|          resnext50_32x4d          |  8   | 1.0002 |  0.9499   |     1.8567     |   1.3157    |  3.5688  |         1.278          |
|           BERT_pytorch            |  16  | 1.0136 |  0.8341   |      0.0       |     0.0     |  3.2877  |         2.4015         |
|      timm_vision_transformer      |  8   | 1.0066 |  0.8492   |     1.7666     |   1.3598    |  3.2177  |         1.5522         |
|        mobilenet_v3_large         |  32  | 1.0091 |  1.0011   |     1.6663     |   1.4112    |  3.0377  |         1.4251         |
|                drq                |  1   | 1.0133 |  0.7933   |     1.9813     |   1.0842    |  2.9886  |         1.1682         |
|             resnet18              |  16  | 1.0048 |  0.9886   |     1.6041     |   1.3566    |  2.7561  |         1.2534         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9988 |  0.9084   |     1.8646     |   1.2136    |  2.6804  |         1.4067         |
|               dcgan               |  32  | 0.9832 |  0.9069   |     1.6896     |   0.7378    |  2.5986  |         1.0705         |
|            mnasnet1_0             |  32  | 0.9994 |  1.0127   |     1.3205     |   1.4002    |  2.5883  |         1.3644         |
|            hf_T5_large            |  2   | 1.0188 |  0.8486   |      0.0       |     0.0     |  2.4774  |         2.1438         |
|           squeezenet1_1           |  32  | 0.998  |  0.9503   |     1.5586     |   1.1802    |  2.4541  |         1.3115         |
|             hf_Albert             |  8   | 1.0008 |  0.9523   |     0.7727     |     0.0     |  2.383   |         2.3241         |
|         timm_efficientnet         |  32  | 0.9607 |  0.8108   |     1.0943     |   1.1787    |  2.1213  |         1.3011         |
|           lennard_jones           | 1000 | 0.9747 |  0.7323   |     1.3002     |   1.0711    |  2.1181  |         1.0605         |
|          pytorch_struct           | 200  | 0.9866 |  0.7356   |     1.0124     |   1.2019    |  2.0781  |         1.2897         |
|              hf_Bert              |  4   | 1.0353 |  0.8574   |     0.9467     |     0.0     |  2.0076  |         1.867          |
|           timm_resnest            |  32  | 1.0032 |  1.0176   |     0.8422     |   1.3619    |  1.9125  |         1.6921         |
|              hf_GPT2              |  4   | 1.0212 |  0.9836   |      0.0       |     0.0     |  1.8747  |         1.8434         |
|          LearningToPaint          |  96  | 0.9982 |   0.997   |     1.2268     |   1.3602    |  1.8664  |         1.3087         |
|               hf_T5               |  8   | 0.9991 |  0.9446   |      0.0       |     0.0     |  1.8435  |         1.8464         |
|              hf_Bart              |  4   | 1.013  |  0.8257   |     0.9947     |     0.0     |  1.8283  |         1.7424         |
|             resnet50              |  32  | 1.0022 |  1.0033   |     1.1198     |   1.3653    |  1.7601  |         1.3577         |
|        speech_transformer         |  32  | 1.005  |  0.8352   |      0.0       |     0.0     |  1.7184  |         1.7328         |
|        shufflenet_v2_x1_0         | 128  | 1.0013 |   1.011   |     0.9852     |   1.3461    |  1.7064  |         1.4223         |
|         soft_actor_critic         | 256  | 0.9955 |  0.7373   |     1.3303     |   1.0479    |  1.6878  |         1.0522         |
|           mobilenet_v2            |  96  | 0.9998 |  0.9885   |     0.7599     |   0.9415    |  1.5595  |         1.5162         |
| attention_is_all_you_need_pytorch | 256  | 1.0083 |  0.9031   |      0.0       |     0.0     |  1.5195  |         1.4737         |
|            timm_nfnet             | 128  | 0.999  |  0.9988   |      0.0       |    1.175    |  1.4917  |         1.427          |
|           hf_DistilBert           |  8   | 1.0027 |  0.9717   |     0.7312     |     0.0     |  1.4716  |         1.4478         |
|           fastNLP_Bert            |  6   | 0.9984 |  0.8833   |     0.7673     |     0.0     |  1.4694  |         1.4267         |
|           pytorch_unet            |  1   | 0.9998 |  0.9923   |     0.863      |   1.1549    |  1.3424  |         1.3159         |
|          pytorch_stargan          |  16  | 0.9953 |   1.02    |     0.9831     |   1.1314    |  1.3238  |         1.2624         |
|            timm_regnet            |  32  | 0.9805 |  0.9278   |     0.9078     |   1.1948    |  1.3204  |         1.2422         |
|            timm_vovnet            |  32  | 0.9219 |  0.8846   |     0.8671     |    1.138    |  1.2965  |         1.1443         |
|            Super_SloMo            |  6   | 0.9996 |  0.9963   |     0.8863     |     0.0     |  1.2894  |         1.2559         |
|               vgg16               |  64  | 0.9997 |  0.9975   |     0.8574     |   0.9962    |  1.2726  |         1.2646         |
|        Background_Matting         |  4   | 1.0001 |  1.0185   |     0.8949     |   1.1153    |  1.2236  |         1.2084         |
|              alexnet              | 128  | 0.9989 |  0.9974   |     0.8148     |   1.0036    |  1.2138  |         1.2089         |
|            hf_Reformer            |  4   | 0.9945 |    1.0    |     0.9452     |     0.0     |  1.1578  |         1.1463         |
|            hf_BigBird             |  2   | 0.9895 |  0.9118   |     1.0551     |     0.0     |  1.1533  |          1.02          |
|   timm_vision_transformer_large   |  8   | 0.9999 |  0.9899   |      0.0       |    0.993    |  1.1506  |         1.1324         |
|              yolov3               |  16  | 0.9997 |  0.9912   |     0.8028     |   0.9293    |  1.1029  |         1.0793         |
|            tts_angular            |  64  | 0.9732 |  0.9438   |     0.9915     |   1.0006    |  1.0239  |         1.0298         |
|              demucs               |  4   | 1.0004 |  0.9999   |     1.0019     |   1.0003    |  1.0006  |         0.9994         |
|      nvidia_deeprecommender       | 256  | 0.9989 |  0.9963   |     0.6967     |   0.9789    |  0.9891  |         1.0305         |
|           hf_GPT2_large           |  4   | 1.0004 |  0.9924   |      0.0       |     0.0     |   0.0    |         1.7538         |
|               dlrm                | 2048 | 1.0705 |  1.1661   |      0.0       |     0.0     |   0.0    |          0.0           |
|           hf_Longformer           |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|              hf_Bert              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|      timm_vision_transformer      |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |          pass          |
|            Super_SloMo            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           fastNLP_Bert            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|             hf_Albert             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bart              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_BigBird             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|         timm_efficientnet         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              yolov3               |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            timm_regnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        Background_Matting         |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|           hf_Longformer           |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|             tacotron2             |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |         0.0000         |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|         timm_efficientdet         |  1   | 20.2568 |  44.9257  |      nan       |     nan     | 506.1619 |        482.7267        |
|              yolov3               |  16  | 3.0753  |  10.8969  |    14.8722     |   40.6957   | 441.9377 |        442.2167        |
|            hf_T5_large            |  2   | 13.6319 |  49.8254  |      nan       |     nan     | 229.3423 |        227.025         |
|        speech_transformer         |  32  | 1.9784  |  11.9084  |      nan       |     nan     | 160.8221 |        163.7258        |
|      timm_vision_transformer      |  8   | 0.9712  |  6.1408   |     8.186      |   14.1079   | 153.2063 |        150.3677        |
| attention_is_all_you_need_pytorch | 256  | 1.3133  |  9.8117   |      nan       |     nan     | 152.8667 |        149.1336        |
|   timm_vision_transformer_large   |  8   | 2.8381  |  20.539   |      nan       |   39.248    | 143.6777 |        145.7532        |
|           timm_resnest            |  32  | 0.6155  |  3.5441   |     4.9432     |   43.0325   | 133.5364 |        132.7415        |
|          pytorch_stargan          |  16  | 0.4005  |  2.8635   |     3.8317     |   7.1985    | 108.2007 |        110.9196        |
|           BERT_pytorch            |  16  |  1.751  |  10.3084  |      nan       |     nan     | 104.5928 |        105.0846        |
|          pytorch_struct           | 200  | 0.2747  |  1.1468   |     1.7965     |   5.4271    | 81.1639  |        97.6714         |
|           fastNLP_Bert            |  6   | 1.8325  |  9.4369   |    14.2174     |     nan     | 73.1299  |        71.7013         |
|              hf_GPT2              |  4   | 1.5632  |  8.3618   |      nan       |     nan     | 68.2801  |        67.1098         |
|              hf_Bart              |  4   | 1.8333  |  11.8565  |    17.1832     |     nan     | 57.9756  |        55.1977         |
|               hf_T5               |  8   | 2.2224  |  11.503   |      nan       |     nan     | 54.0425  |         52.434         |
|            densenet121            |  4   | 2.3207  |  17.2755  |     25.975     |  126.4447   | 53.7683  |        52.2498         |
|            hf_BigBird             |  2   |  8.256  |  17.1759  |    37.3561     |     nan     | 49.9161  |        32.4436         |
|             hf_Albert             |  8   |  1.372  |  8.5864   |    12.6169     |     nan     | 47.6027  |        47.0628         |
|        mobilenet_v3_large         |  32  | 0.9924  |  6.6522   |     9.1224     |   72.4189   | 47.2931  |        47.1211         |
|              hf_Bert              |  4   | 1.6875  |  9.1713   |    12.6826     |     nan     | 47.1679  |         44.934         |
|            timm_regnet            |  32  | 2.4131  |  10.885   |    25.7165     |   61.7176   | 40.6208  |        38.8687         |
|         timm_efficientnet         |  32  | 1.9005  |  8.9344   |    19.4183     |   69.1817   | 38.2166  |        37.4503         |
|            hf_Reformer            |  4   | 2.5095  |  5.4098   |    10.0855     |     nan     |  36.274  |        31.5124         |
|           hf_DistilBert           |  8   | 0.6417  |  4.3991   |     8.6419     |     nan     | 36.0029  |        34.4521         |
|          resnext50_32x4d          |  8   |  1.017  |  6.5326   |     8.8348     |   36.8789   | 33.0991  |        32.6447         |
|            timm_nfnet             | 128  | 2.0405  |  9.7987   |      nan       |   38.7761   | 33.0862  |        32.2698         |
|             resnet50              |  32  | 0.9534  |  6.4667   |     9.1686     |   40.9829   | 32.2705  |        30.9612         |
|            mnasnet1_0             |  32  | 0.9236  |  6.2664   |     8.6174     |   43.8239   | 32.0749  |        31.4707         |
|            timm_vovnet            |  32  | 1.5435  |  5.7532   |    12.3402     |   31.1761   |  31.495  |        31.0663         |
|       functorch_dp_cifar10        |  64  | 0.3955  |  2.6387   |     3.6111     |   6.4988    |  27.269  |        27.0683         |
|        shufflenet_v2_x1_0         | 128  | 1.0466  |  6.9601   |     9.9451     |   37.4913   | 22.0414  |        21.3245         |
|             resnet18              |  16  | 0.4597  |  2.5088   |     3.4368     |   23.4051   | 21.6736  |        22.9361         |
|        Background_Matting         |  4   | 1.0296  |  6.1631   |     9.1283     |   42.2202   | 20.7841  |        19.4964         |
|            Super_SloMo            |  6   | 1.0912  |  6.3965   |     8.6337     |     nan     | 20.3786  |        20.1479         |
|           mobilenet_v2            |  96  | 0.8917  |  6.0674   |     8.7936     |   42.0052   | 20.1534  |        19.6406         |
|           pytorch_unet            |  1   |  0.472  |  2.8631   |     3.9843     |   26.121    |  9.8541  |         9.393          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.4377  |  2.9424   |     4.0751     |   4.9125    |  9.6837  |         9.512          |
|          LearningToPaint          |  96  |  0.478  |  2.5734   |     3.7702     |   30.3846   |  8.7607  |         8.2774         |
|           squeezenet1_1           |  32  | 0.2678  |  1.4287   |     2.0574     |   6.6429    |  5.2996  |         5.0315         |
|      nvidia_deeprecommender       | 256  | 0.2144  |  0.6516   |     1.0085     |   2.9776    |  4.7662  |         4.6211         |
|               vgg16               |  64  | 0.2004  |  0.9706   |     1.455      |   3.5647    |  4.4288  |         4.1163         |
|                drq                |  1   | 0.1632  |  0.6636   |     1.0549     |    4.402    |  4.2396  |         3.7289         |
|         soft_actor_critic         | 256  | 0.2134  |  0.4437   |     0.6851     |   2.0419    |  3.7184  |         3.0465         |
|              alexnet              | 128  | 0.1743  |  0.6065   |     0.9186     |   3.2161    |  3.4021  |         3.3315         |
|               dcgan               |  32  |  0.177  |  0.5579   |     0.8046     |   4.2462    |  3.0432  |         2.7794         |
|           lennard_jones           | 1000 | 0.1614  |  0.4458   |     0.632      |   1.4727    |  2.2724  |         2.1345         |
|            tts_angular            |  64  | 0.2307  |  0.3099   |     0.4268     |   1.0585    |  1.958   |         1.7436         |
|              demucs               |  4   | 0.3522  |   0.359   |     0.3559     |   0.3792    |  0.267   |         0.2785         |
|           hf_GPT2_large           |  4   | 5.5989  |  27.5163  |      nan       |     nan     |   nan    |        157.5729        |
|               dlrm                | 2048 | 0.4799  |  1.0515   |      nan       |     nan     |   nan    |          nan           |
|           hf_Longformer           |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|         timm_efficientnet         |  32  | 0.988  |  0.7698   |     0.2719     |   0.7887    |  1.2042  |         1.2318         |
|             hf_Albert             |  8   | 0.9814 |  0.9359   |     0.3273     |     nan     |  1.1576  |         1.4693         |
|            Super_SloMo            |  6   | 1.0024 |  0.9645   |     0.3842     |     nan     |  1.0536  |         1.1475         |
|            timm_nfnet             | 128  | 0.9693 |  0.8982   |      nan       |   0.9445    |  1.0337  |         1.1245         |
|         timm_efficientdet         |  1   | 1.028  |  0.8404   |      nan       |     nan     |  1.0226  |         1.0403         |
|           mobilenet_v2            |  96  | 0.9857 |  0.7639   |     0.3119     |   0.9117    |  1.0074  |         1.0232         |
|            tts_angular            |  64  | 1.0002 |  1.0002   |     0.9853     |   1.0002    |  0.9895  |         1.0002         |
|              demucs               |  4   | 0.9872 |  0.9872   |     0.9872     |   0.9872    |  0.9872  |         0.9872         |
| attention_is_all_you_need_pytorch | 256  | 0.9979 |   0.94    |      nan       |     nan     |  0.9829  |         1.1269         |
|           BERT_pytorch            |  16  |  1.0   |  0.8822   |      nan       |     nan     |  0.9728  |         1.1011         |
|              hf_GPT2              |  4   | 0.9706 |  0.8625   |      nan       |     nan     |  0.9648  |         1.1252         |
|        Background_Matting         |  4   | 1.0138 |  0.9624   |     0.3723     |   0.9813    |  0.9316  |         0.9364         |
|               hf_T5               |  8   | 0.9678 |  0.9371   |      nan       |     nan     |  0.9309  |         1.2521         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.999  |  0.8609   |     0.4238     |   0.8441    |  0.9289  |         0.982          |
|            timm_regnet            |  32  | 0.9953 |  0.8446   |     0.3493     |    0.85     |  0.9249  |         0.9292         |
|        speech_transformer         |  32  | 1.0017 |  0.9174   |      nan       |     nan     |  0.9066  |         0.9109         |
|              hf_Bert              |  4   | 0.9844 |  0.8677   |     0.3806     |     nan     |  0.9017  |         0.9414         |
|              yolov3               |  16  | 0.9908 |  0.8381   |     0.3537     |   0.8244    |  0.8991  |         0.9038         |
|   timm_vision_transformer_large   |  8   | 0.9974 |  0.8357   |      nan       |   0.8494    |  0.879   |         0.9542         |
|           timm_resnest            |  32  | 0.9868 |  0.8711   |     0.3483     |   0.8623    |  0.8759  |         0.9953         |
|            densenet121            |  4   | 0.9857 |  0.8678   |     0.3673     |   0.8376    |  0.8753  |         0.9535         |
|           pytorch_unet            |  1   | 0.9968 |  0.8653   |     0.3571     |   0.8496    |  0.8678  |         0.8715         |
|           fastNLP_Bert            |  6   | 1.0012 |  0.8966   |     0.3702     |     nan     |  0.8661  |         1.0348         |
|             resnet50              |  32  | 0.9907 |  0.8629   |     0.3559     |   0.7995    |  0.8659  |         0.885          |
|           squeezenet1_1           |  32  | 0.9604 |  0.7958   |     0.3456     |   0.7589    |  0.8611  |         0.8951         |
|        shufflenet_v2_x1_0         | 128  | 0.956  |  0.8401   |     0.3573     |   0.8503    |  0.856   |         0.8927         |
|            hf_T5_large            |  2   | 0.8541 |  0.8541   |      nan       |     nan     |  0.8541  |         0.8541         |
|           hf_DistilBert           |  8   | 0.9505 |  0.8806   |     0.3229     |     nan     |  0.8387  |         0.9058         |
|               dcgan               |  32  | 0.9698 |  0.7838   |     0.4994     |   0.7073    |  0.8283  |         0.8738         |
|              hf_Bart              |  4   | 0.9102 |  0.8321   |     0.3491     |     nan     |  0.8137  |         0.9762         |
|            hf_BigBird             |  2   | 0.9837 |  0.9784   |     0.4543     |     nan     |  0.8098  |         1.096          |
|              alexnet              | 128  | 0.951  |  0.7753   |     0.4793     |   0.7753    |  0.7974  |         0.9099         |
|        mobilenet_v3_large         |  32  | 0.9776 |  0.8499   |     0.3444     |    0.866    |  0.7918  |         0.8145         |
|          pytorch_stargan          |  16  | 0.9929 |  0.9742   |     0.4253     |   0.8882    |  0.7783  |         0.8847         |
|          resnext50_32x4d          |  8   | 0.9932 |  0.8549   |     0.3882     |   0.8176    |  0.7644  |         0.7753         |
|            mnasnet1_0             |  32  | 0.9785 |  0.8621   |     0.3409     |   0.8207    |  0.7541  |         0.7741         |
|                drq                |  1   | 0.9877 |  0.8312   |     0.4769     |   0.8308    |  0.752   |         0.9256         |
|            timm_vovnet            |  32  | 0.9903 |  0.7678   |     0.3408     |   0.7742    |  0.7513  |         0.761          |
|               vgg16               |  64  | 0.9924 |  0.7339   |     0.3775     |   0.7172    |  0.7491  |         0.7534         |
|          LearningToPaint          |  96  | 0.9252 |  0.7196   |     0.3827     |   0.6722    |  0.7295  |         0.8017         |
|         soft_actor_critic         | 256  | 0.9998 |  0.9149   |     0.4736     |   0.9149    |  0.7295  |         1.0367         |
|      timm_vision_transformer      |  8   | 0.9952 |  0.8826   |     0.3917     |   0.8871    |  0.7151  |         0.7249         |
|             resnet18              |  16  | 0.9779 |  0.7727   |     0.3941     |   0.7276    |  0.6102  |         0.6257         |
|           lennard_jones           | 1000 | 0.9995 |  0.9997   |     0.3734     |   1.0967    |  0.564   |         0.9991         |
|      nvidia_deeprecommender       | 256  | 0.5596 |  0.5596   |     0.5125     |   0.5596    |  0.5596  |         0.5596         |
|       functorch_dp_cifar10        |  64  | 0.9964 |  0.8107   |     0.4465     |   0.8452    |  0.4478  |         0.4806         |
|          pytorch_struct           | 200  |  1.0   |  0.5081   |     0.4858     |   0.5082    |  0.4235  |         0.4307         |
|            hf_Reformer            |  4   | 0.3764 |  0.9847   |     0.3481     |     nan     |  0.3629  |         0.9878         |
|           hf_GPT2_large           |  4   | 0.9582 |  0.8645   |      nan       |     nan     |   nan    |         1.1351         |
|               dlrm                | 2048 | 0.7301 |  0.7306   |      nan       |     nan     |   nan    |          nan           |
|           hf_Longformer           |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|            YituTechConvBert             |  1  | 1.0261 |  0.8388   |      0.0       |     0.0     |  4.8985  |         1.7002         |
|          MobileBertForMaskedLM          | 32  | 1.0182 |  0.8212   |      0.0       |     0.0     |  4.4522  |         1.8054         |
|     MobileBertForQuestionAnswering      | 64  | 1.0173 |   0.828   |      0.0       |     0.0     |  3.7208  |         1.8261         |
|       MT5ForConditionalGeneration       |  8  | 1.0207 |   0.862   |      0.0       |     0.0     |  3.6457  |         2.5163         |
|                CamemBert                |  1  | 1.0365 |  0.8361   |     1.763      |     0.0     |  3.6419  |         1.8243         |
|               DistillGPT2               |  1  | 1.0307 |  0.8606   |     1.2652     |     0.0     |  2.7645  |         2.0213         |
|     M2M100ForConditionalGeneration      |  8  | 1.0377 |  0.8116   |     1.2351     |     0.0     |  2.4115  |         1.7558         |
|      GPT2ForSequenceClassification      |  4  | 1.0012 |  0.9681   |      0.0       |     0.0     |  2.148   |         2.1145         |
|     PLBartForConditionalGeneration      | 16  | 1.011  |  0.8331   |     1.0404     |     0.0     |  2.0049  |         1.774          |
|    MegatronBertForQuestionAnswering     | 16  | 1.0299 |  0.8553   |     1.2392     |     0.0     |  1.9843  |         1.8075         |
|       ElectraForQuestionAnswering       | 64  | 1.0001 |  0.9802   |     0.7682     |     0.0     |  1.9558  |         1.9073         |
|             XGLMForCausalLM             |  8  | 1.0108 |  0.8287   |      0.0       |     0.0     |  1.8404  |         1.6227         |
|         MegatronBertForCausalLM         | 16  | 1.0332 |  0.8552   |     0.989      |     0.0     |  1.8374  |         1.7637         |
|           ElectraForCausalLM            | 32  | 1.0002 |  0.9403   |     0.7075     |     0.0     |  1.8001  |         1.8004         |
|    LayoutLMForSequenceClassification    | 16  | 1.0001 |  0.9808   |     0.7748     |     0.0     |  1.7382  |         1.6968         |
|      MBartForConditionalGeneration      | 16  | 1.0136 |  0.8352   |      0.0       |     0.0     |  1.7367  |         1.6287         |
|     PegasusForConditionalGeneration     | 16  | 1.0104 |  0.8297   |     0.9354     |     0.0     |  1.7092  |         1.5782         |
|                 T5Small                 |  1  | 1.0251 |  0.8758   |      0.0       |     0.0     |  1.6655  |         1.4565         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.8859   |      0.0       |     0.0     |  1.6534  |         1.6504         |
|            AlbertForMaskedLM            |  4  | 1.0001 |  0.8856   |      0.0       |     0.0     |  1.6385  |         1.6467         |
|         Speech2Text2ForCausalLM         | 128 | 1.0033 |  0.9338   |     0.7421     |     0.0     |  1.6375  |         1.6423         |
|             OPTForCausalLM              | 32  | 1.0142 |  0.9311   |     0.7691     |     0.0     |  1.6088  |         1.5958         |
|       T5ForConditionalGeneration        |  4  | 0.9996 |  0.9392   |      0.0       |     0.0     |  1.6069  |         1.5888         |
|           LayoutLMForMaskedLM           | 16  | 1.0003 |  0.9714   |     0.753      |     0.0     |  1.586   |         1.5642         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.007  |  0.9236   |      0.0       |     0.0     |  1.4742  |         1.4408         |
|      BartForConditionalGeneration       |  2  | 1.0052 |  0.9074   |      0.0       |     0.0     |  1.4709  |         1.4385         |
|             BartForCausalLM             |  4  | 1.0013 |  0.9616   |     0.7533     |     0.0     |  1.4604  |         1.4593         |
|     DistilBertForQuestionAnswering      | 64  | 1.0003 |  0.9524   |     0.7402     |     0.0     |  1.4461  |         1.4039         |
|       RobertaForQuestionAnswering       | 128 | 1.0002 |  0.9836   |     0.7753     |     0.0     |  1.4344  |         1.3939         |
|        BertForQuestionAnswering         | 128 |  1.0   |  0.9757   |     0.7782     |     0.0     |  1.421   |         1.3988         |
|           RobertaForCausalLM            | 64  | 1.0004 |  0.9601   |     0.7491     |     0.0     |  1.4094  |         1.4059         |
|            PLBartForCausalLM            | 32  | 1.0073 |   0.941   |     0.7748     |     0.0     |  1.3293  |         1.3277         |
|             BertForMaskedLM             | 64  | 1.0003 |  0.9579   |     0.7352     |     0.0     |  1.3213  |         1.3142         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0016 |  0.9177   |     0.6941     |     0.0     |  1.3089  |         1.3234         |
|           DebertaForMaskedLM            |  4  | 0.9308 |  0.7323   |     0.804      |     0.0     |  1.2934  |         1.1876         |
|          DistilBertForMaskedLM          | 64  | 1.0004 |  0.9399   |     0.6937     |     0.0     |  1.2735  |         1.2788         |
|            MBartForCausalLM             | 32  | 1.0044 |  0.9531   |     0.7503     |     0.0     |  1.2208  |         1.2239         |
|            TrOCRForCausalLM             | 32  | 1.0018 |   0.954   |      0.0       |     0.0     |  1.2156  |         1.2173         |
|           PegasusForCausalLM            | 32  | 1.0024 |  0.9517   |     0.7498     |     0.0     |  1.1992  |         1.2018         |
|                 BigBird                 |  1  | 0.9917 |  0.9182   |     1.0457     |     0.0     |  1.1507  |         1.0261         |
|       DebertaForQuestionAnswering       |  8  | 0.9958 |  0.7841   |     0.7225     |     0.0     |  1.1399  |         1.1711         |
|          AllenaiLongformerBase          |  0  |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+
|                  name                   | bs |    eager    |  aot_eager  | aot_cudagraphs | aot_nvfuser |  inductor   | inductor_no_cudagraphs |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+
|            AlbertForMaskedLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       AlbertForQuestionAnswering        | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     PegasusForConditionalGeneration     | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           RobertaForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       RobertaForQuestionAnswering       | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|         Speech2Text2ForCausalLM         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            TrOCRForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|      BartForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|           DebertaForMaskedLM            | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      GPT2ForSequenceClassification      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       MT5ForConditionalGeneration       | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|          MobileBertForMaskedLM          | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|     MobileBertForQuestionAnswering      | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       T5ForConditionalGeneration        | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|                 T5Small                 | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             XGLMForCausalLM             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            XLNetLMHeadModel             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            YituTechConvBert             | 1  |    pass     |    pass     |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       DebertaForQuestionAnswering       | 1  |    pass     |    pass     | fail_accuracy  | fail_to_run |    pass     |          pass          |
|           PegasusForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            PLBartForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             OPTForCausalLM              | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     DistilBertForQuestionAnswering      | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             BartForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|             BertForMaskedLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|        BertForQuestionAnswering         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|                 BigBird                 | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|                CamemBert                | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|          DistilBertForMaskedLM          | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|               DistillGPT2               | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           ElectraForCausalLM            | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|       ElectraForQuestionAnswering       | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|           LayoutLMForMaskedLM           | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|    LayoutLMForSequenceClassification    | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|     M2M100ForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|            MBartForCausalLM             | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|         MegatronBertForCausalLM         | 1  |    pass     |    pass     |      pass      | fail_to_run |    pass     |          pass          |
|      MBartForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run | fail_to_run |      fail_to_run       |
|     PLBartForConditionalGeneration      | 1  |    pass     |    pass     |      pass      | fail_to_run | fail_to_run |      fail_to_run       |
|          AllenaiLongformerBase          | 1  | fail_to_run | fail_to_run |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
+-----------------------------------------+----+-------------+-------------+----------------+-------------+-------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|             XGLMForCausalLM             |  8  | 2.8838 |  17.8117  |      nan       |     nan     | 194.858  |        192.779         |
|           DebertaForMaskedLM            |  4  | 5.1245 |  13.1366  |    48.1582     |     nan     | 180.9537 |        111.1218        |
|       DebertaForQuestionAnswering       |  8  | 4.919  |  13.4612  |    49.4172     |     nan     | 170.8762 |        106.3058        |
|            YituTechConvBert             |  1  | 2.5334 |  14.306   |      nan       |     nan     | 140.1718 |        138.9927        |
|     M2M100ForConditionalGeneration      |  8  | 3.5198 |  23.236   |    35.7484     |     nan     | 139.9786 |        144.4965        |
|          MobileBertForMaskedLM          | 32  | 9.1132 |  41.3418  |      nan       |     nan     | 118.6315 |        113.5782        |
|     MobileBertForQuestionAnswering      | 64  | 9.0474 |  41.8023  |      nan       |     nan     | 101.2692 |        98.7701         |
|       MT5ForConditionalGeneration       |  8  | 3.487  |  17.0395  |      nan       |     nan     | 101.2338 |        92.3566         |
|         MegatronBertForCausalLM         | 16  | 3.6761 |  18.9528  |    27.4653     |     nan     | 75.4738  |        72.4095         |
|     PegasusForConditionalGeneration     | 16  | 3.4229 |  22.2052  |     34.924     |     nan     | 73.0347  |        69.7138         |
|    MegatronBertForQuestionAnswering     | 16  | 3.7886 |  18.9105  |    27.2742     |     nan     | 72.2236  |        70.2981         |
|      BartForConditionalGeneration       |  2  | 3.5687 |  22.6388  |      nan       |     nan     | 68.9738  |        67.2779         |
|      MBartForConditionalGeneration      | 16  | 3.6286 |  23.2015  |      nan       |     nan     | 66.7567  |        65.2675         |
|       T5ForConditionalGeneration        |  4  | 2.1607 |  11.4585  |      nan       |     nan     | 63.4196  |        64.7334         |
|    LayoutLMForSequenceClassification    | 16  | 1.8721 |  9.8564   |     13.681     |     nan     | 62.7043  |        61.6224         |
|                 T5Small                 |  1  | 2.1807 |  11.4387  |      nan       |     nan     | 60.7468  |        59.8897         |
|     PLBartForConditionalGeneration      | 16  | 1.867  |  11.6157  |    17.1196     |     nan     |  53.589  |        52.3927         |
| BlenderbotSmallForConditionalGeneration | 64  | 2.2026 |  15.2011  |      nan       |     nan     | 51.6907  |        50.4226         |
|                 BigBird                 |  1  | 8.104  |  17.4303  |    37.7102     |     nan     |  49.115  |        32.2434         |
|           ElectraForCausalLM            | 32  | 1.7431 |  9.4797   |    13.4844     |     nan     | 48.0664  |        46.9807         |
|           LayoutLMForMaskedLM           | 16  | 1.854  |   9.957   |    14.0422     |     nan     | 40.4533  |         38.52          |
|             BertForMaskedLM             | 64  | 1.6434 |  9.1451   |    13.0884     |     nan     | 40.4402  |        39.1127         |
|       ElectraForQuestionAnswering       | 64  | 1.7135 |  9.4156   |     12.908     |     nan     | 37.6691  |        37.1638         |
|           RobertaForCausalLM            | 64  | 1.6218 |  9.5038   |    13.2553     |     nan     | 35.1188  |        35.2797         |
|      GPT2ForSequenceClassification      |  4  | 1.6102 |  8.4573   |      nan       |     nan     | 34.7604  |        33.9281         |
|           PegasusForCausalLM            | 32  | 1.2933 |  8.4453   |    12.7766     |     nan     | 33.7181  |        32.0536         |
|        BertForQuestionAnswering         | 128 | 1.6308 |  9.5382   |    12.9968     |     nan     | 32.5755  |        31.8769         |
|            MBartForCausalLM             | 32  | 1.2688 |  8.6057   |    12.3507     |     nan     | 30.8586  |         30.114         |
|            TrOCRForCausalLM             | 32  | 1.2698 |  8.5239   |      nan       |     nan     | 30.6852  |        29.5408         |
|               DistillGPT2               |  1  | 0.777  |  4.2461   |     6.035      |     nan     | 30.1705  |        27.0712         |
|             BartForCausalLM             |  4  | 1.2858 |  8.7823   |    12.4256     |     nan     | 30.0352  |        28.6798         |
|            AlbertForMaskedLM            |  4  | 1.4898 |  8.9688   |      nan       |     nan     |  29.804  |        28.8283         |
|       RobertaForQuestionAnswering       | 128 | 1.5995 |  9.5258   |    13.6496     |     nan     | 29.1998  |        28.6459         |
|       AlbertForQuestionAnswering        |  4  | 1.4738 |  8.9295   |      nan       |     nan     |  28.742  |        27.1453         |
|          DistilBertForMaskedLM          | 64  | 0.5761 |  4.5213   |     8.775      |     nan     | 28.6222  |        27.9459         |
|       BlenderbotSmallForCausalLM        | 64  | 0.8712 |  5.8154   |     8.3073     |     nan     | 27.9038  |        27.3704         |
|     DistilBertForQuestionAnswering      | 64  | 0.6752 |  4.5682   |     8.8076     |     nan     | 27.8616  |        26.8576         |
|             OPTForCausalLM              | 32  | 1.329  |  8.7064   |    19.8936     |     nan     | 26.9542  |        27.1551         |
|                CamemBert                |  1  | 1.7806 |  9.6387   |    13.0603     |     nan     | 26.2214  |         25.541         |
|         Speech2Text2ForCausalLM         | 128 | 0.6961 |  4.3732   |     7.0361     |     nan     | 22.7181  |        21.5376         |
|            PLBartForCausalLM            | 32  | 0.7107 |  4.4018   |     6.2174     |     nan     | 22.5965  |        21.7864         |
|          AllenaiLongformerBase          |  0  |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |   0.754   |      nan       |     nan     |  1.1305  |         1.5536         |
|            AlbertForMaskedLM            |  4  | 0.9998 |  0.7431   |      nan       |     nan     |  1.1078  |         1.5319         |
|             BartForCausalLM             |  4  |  1.0   |  0.8997   |     0.3619     |     nan     |  1.0943  |         1.1562         |
|      GPT2ForSequenceClassification      |  4  | 0.9675 |  0.9164   |      nan       |     nan     |  1.0779  |         1.1637         |
|           PegasusForCausalLM            | 32  | 0.9749 |  0.8906   |     0.4175     |     nan     |  1.0189  |         1.0913         |
|       RobertaForQuestionAnswering       | 128 | 1.0008 |   0.952   |     0.3554     |     nan     |  1.0005  |         1.0676         |
|        BertForQuestionAnswering         | 128 | 1.0008 |   0.952   |     0.3554     |     nan     |  1.0005  |         1.0676         |
|       T5ForConditionalGeneration        |  4  | 0.9996 |  0.9594   |      nan       |     nan     |  0.995   |         1.2292         |
|    LayoutLMForSequenceClassification    | 16  | 1.004  |  0.9325   |     0.3632     |     nan     |  0.9943  |         1.0278         |
|       ElectraForQuestionAnswering       | 64  | 1.0016 |  0.9538   |     0.3384     |     nan     |  0.9938  |         1.0704         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.9133   |      nan       |     nan     |  0.9913  |         1.1976         |
|                 T5Small                 |  1  |  1.0   |  0.9124   |      nan       |     nan     |  0.9874  |          1.15          |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9238   |     0.3549     |     nan     |  0.9871  |         1.0263         |
|            MBartForCausalLM             | 32  |  1.0   |  0.8924   |     0.3782     |     nan     |  0.9868  |         1.0636         |
|             BertForMaskedLM             | 64  | 0.9996 |   0.899   |     0.3628     |     nan     |  0.9811  |         1.0366         |
|           RobertaForCausalLM            | 64  | 0.9991 |  0.8994   |     0.3626     |     nan     |  0.9801  |         1.0358         |
|             OPTForCausalLM              | 32  | 0.9996 |  0.8679   |     0.3481     |     nan     |  0.9718  |         1.0617         |
|            TrOCRForCausalLM             | 32  |  1.0   |  0.8921   |      nan       |     nan     |  0.9642  |         1.0376         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9999 |  0.8918   |      nan       |     nan     |  0.9593  |         1.1105         |
|          DistilBertForMaskedLM          | 64  | 0.9999 |  0.8599   |     0.3477     |     nan     |  0.948   |         1.0272         |
|         Speech2Text2ForCausalLM         | 128 | 0.9676 |  0.8196   |     0.3532     |     nan     |  0.946   |         1.0737         |
|      MBartForConditionalGeneration      | 16  |  1.0   |  0.8695   |      nan       |     nan     |  0.939   |         1.0986         |
|           ElectraForCausalLM            | 32  | 0.9996 |   0.848   |     0.3553     |     nan     |  0.9319  |         1.0177         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9996 |  0.8172   |     0.3597     |     nan     |  0.9269  |         1.0441         |
|            PLBartForCausalLM            | 32  | 1.0003 |  0.8444   |     0.3722     |     nan     |  0.9214  |         1.0168         |
|         MegatronBertForCausalLM         | 16  | 0.9998 |  0.8499   |     0.3975     |     nan     |  0.921   |         1.0277         |
|       MT5ForConditionalGeneration       |  8  | 0.919  |   0.83    |      nan       |     nan     |  0.919   |         0.919          |
|     PegasusForConditionalGeneration     | 16  | 0.9985 |  0.9628   |     0.4377     |     nan     |  0.9159  |         1.0984         |
|     DistilBertForQuestionAnswering      | 64  | 1.0004 |  0.9216   |     0.3465     |     nan     |  0.9129  |         1.0128         |
|    MegatronBertForQuestionAnswering     | 16  |  1.0   |  0.8529   |     0.411      |     nan     |  0.893   |         1.0093         |
|     PLBartForConditionalGeneration      | 16  | 0.9983 |  0.9007   |     0.3949     |     nan     |  0.8775  |         1.0294         |
|                CamemBert                |  1  | 0.9989 |  0.7872   |     0.4083     |     nan     |  0.8654  |         0.9312         |
|            YituTechConvBert             |  1  | 0.9718 |  0.7819   |      nan       |     nan     |  0.8618  |         0.9318         |
|                 BigBird                 |  1  | 1.0008 |  0.9547   |     0.4478     |     nan     |  0.8348  |         1.1036         |
|             XGLMForCausalLM             |  8  | 0.9918 |  0.9234   |      nan       |     nan     |  0.8333  |         1.0324         |
|               DistillGPT2               |  1  | 0.9963 |  0.7527   |     0.3883     |     nan     |  0.8288  |         1.0239         |
|     M2M100ForConditionalGeneration      |  8  | 0.9967 |  0.9427   |     0.4275     |     nan     |  0.7774  |         1.0309         |
|          MobileBertForMaskedLM          | 32  | 0.9998 |  0.8864   |      nan       |     nan     |  0.6997  |         0.9454         |
|     MobileBertForQuestionAnswering      | 64  | 1.0153 |  0.9965   |      nan       |     nan     |  0.6085  |         0.8221         |
|           DebertaForMaskedLM            |  4  | 0.9982 |  0.9824   |     0.3623     |     nan     |  0.4498  |         1.1123         |
|       DebertaForQuestionAnswering       |  8  | 0.9754 |  1.0737   |     0.3252     |     nan     |  0.3361  |         1.1932         |
|          AllenaiLongformerBase          |  0  |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|      xcit_large_24_p8_224       |  5  | 1.0024 |    0.0    |      0.0       |     0.0     |  2.1824  |         1.8642         |
|        tnt_s_patch16_224        | 128 | 1.0001 |  0.9971   |      0.0       |    1.948    |  2.1266  |         2.0956         |
|           regnety_002           | 128 | 0.9778 |  0.9331   |     1.1329     |   1.3739    |  2.1185  |         1.4615         |
|            lcnet_050            | 128 | 0.9687 |   0.945   |     0.8523     |   1.5942    |  2.0966  |         1.6246         |
|          ghostnet_100           | 128 | 1.0047 |  0.9954   |     0.9024     |   1.5531    |  2.0772  |         1.7315         |
|        twins_pcpvt_base         | 64  | 1.007  |  0.9131   |     0.9227     |   1.3661    |  1.7575  |         1.6661         |
|        res2net101_26w_4s        | 64  | 1.0023 |  0.9884   |     0.9714     |   1.4016    |  1.606   |         1.3414         |
|           volo_d1_224           | 64  | 0.9998 |  0.9948   |      0.0       |   1.1408    |  1.6008  |         1.5636         |
|            hrnet_w18            | 128 | 1.0034 |  1.0224   |     0.8621     |   1.4764    |  1.5939  |         1.4836         |
|             dla102              | 128 | 1.0001 |  0.9957   |     0.8371     |   1.4147    |  1.5778  |         1.5432         |
|          gmixer_24_224          | 128 | 0.9999 |   0.88    |      0.0       |   0.9998    |  1.5532  |         1.5487         |
|            nfnet_l0             | 128 | 0.9999 |  0.8113   |     0.7132     |   1.0386    |  1.5398  |         1.4649         |
|           resnest101e           | 64  | 0.9998 |  0.9918   |     0.813      |    1.25     |  1.5192  |         1.5067         |
|          gmlp_s16_224           | 128 | 0.9999 |  0.9964   |      0.0       |   1.0513    |  1.5183  |         1.524          |
|       gluon_inception_v3        | 128 |  1.0   |  0.9963   |     0.8531     |   1.1944    |  1.5053  |         1.4695         |
|        adv_inception_v3         | 128 |  1.0   |  0.9965   |     0.8532     |   1.1949    |  1.5044  |         1.4699         |
|          inception_v3           | 128 | 0.9999 |  0.9963   |     0.8532     |   1.1947    |  1.503   |         1.4668         |
|           dm_nfnet_f0           | 128 | 0.9988 |    1.0    |      0.0       |   1.1763    |  1.4955  |         1.4275         |
|  swin_base_patch4_window7_224   | 64  | 0.9996 |  0.9572   |      0.0       |    1.04     |  1.4854  |         1.4803         |
|        res2net50_14w_8s         | 128 | 0.9999 |  0.9936   |     0.8086     |   1.2808    |  1.4708  |         1.4281         |
|          cait_m36_384           |  4  | 1.0003 |  1.0099   |      0.0       |   1.0348    |  1.4624  |         1.4148         |
|      mobilenetv3_large_100      | 128 | 0.9727 |  0.9456   |     0.7823     |   1.3423    |  1.4581  |         1.4361         |
|         crossvit_9_240          | 128 |  1.0   |  0.9951   |     0.8375     |   1.0597    |  1.4522  |         1.4158         |
|           selecsls42b           | 128 | 0.9998 |  0.9958   |     0.8397     |    1.358    |  1.4435  |         1.411          |
|         coat_lite_mini          | 128 | 1.0001 |  0.9895   |     0.8423     |   1.2195    |  1.4234  |         1.4001         |
|            fbnetv3_b            | 128 | 0.9542 |  0.9408   |     0.7728     |   1.2662    |  1.4175  |         1.4082         |
|           res2next50            | 128 | 0.9996 |  0.9957   |     0.8309     |   1.2102    |  1.4136  |          1.35          |
|          resmlp_12_224          | 128 | 1.0001 |  0.9978   |     0.7824     |     0.0     |  1.4002  |          1.35          |
|         mobilenetv2_100         | 128 | 0.9513 |  0.9412   |     0.7198     |   0.8654    |  1.4001  |         1.4317         |
|          jx_nest_base           | 32  | 0.9997 |  0.9927   |      0.0       |   1.2258    |   1.4    |         1.3681         |
|           mnasnet_100           | 128 | 0.9545 |  0.9444   |     0.7881     |    1.359    |  1.3888  |         1.4578         |
|           mobilevit_s           | 64  | 0.9734 |  0.8144   |     0.6515     |   1.1125    |  1.3767  |         1.3634         |
|        ese_vovnet19b_dw         | 128 | 0.9691 |  0.9649   |     0.7682     |   1.2449    |  1.3751  |         1.3768         |
|          spnasnet_100           | 128 | 0.9456 |  0.9381   |     0.7758     |   1.3141    |  1.3666  |         1.392          |
|            pit_b_224            | 64  | 0.9999 |  0.9957   |     0.822      |   1.0626    |  1.3587  |         1.3526         |
|           fbnetc_100            | 128 | 0.9519 |  0.9449   |     0.792      |   1.3759    |  1.3517  |         1.3733         |
|       tf_efficientnet_b0        | 128 | 0.965  |   0.808   |     0.6661     |   1.0954    |  1.3471  |         1.3552         |
|           convit_base           | 64  | 1.0001 |  0.9974   |      0.0       |     0.0     |  1.3465  |         1.3636         |
|          cspdarknet53           | 64  | 0.9428 |  0.9336   |     0.7555     |   0.9018    |  1.3293  |         1.3464         |
|         poolformer_m36          | 64  | 0.9997 |   0.998   |     0.8072     |     0.0     |  1.3267  |         1.2959         |
|          botnet26t_256          | 128 | 0.9796 |   0.974   |     0.8095     |   1.3452    |  1.3261  |         1.331          |
|          pnasnet5large          | 16  | 1.0054 |  1.0281   |     0.853      |   1.1408    |  1.3191  |         1.2954         |
|       eca_botnext26ts_256       | 128 | 0.9809 |  0.8117   |     0.6713     |   1.1566    |  1.2931  |         1.2825         |
|      beit_base_patch16_224      | 64  |  1.0   |  0.9784   |      0.0       |   1.0451    |  1.2869  |         1.2659         |
|          mixer_b16_224          | 128 | 1.0002 |  0.9976   |     0.805      |   0.9603    |  1.285   |         1.2738         |
|           rexnet_100            | 128 | 0.9647 |  0.8505   |     0.6903     |    1.038    |  1.2843  |         1.2728         |
| deit_base_distilled_patch16_224 | 64  | 0.9999 |  0.9918   |     0.7974     |   1.0603    |  1.2826  |         1.2616         |
|            tinynet_a            | 128 | 0.9692 |  0.8005   |     0.6569     |   1.0896    |  1.2591  |         1.2658         |
|         visformer_small         | 128 | 0.9995 |  1.0024   |     0.8402     |   1.0846    |  1.2371  |         1.1814         |
|        eca_halonext26ts         | 128 | 0.9809 |  0.8168   |     0.6792     |   1.1486    |  1.2156  |          0.0           |
|        sebotnet33ts_256         | 64  | 0.9661 |  0.8367   |     0.6798     |   1.1159    |  1.2009  |         1.2101         |
|      vit_base_patch16_224       | 64  | 1.0001 |  0.9941   |     0.8348     |   0.9937    |  1.1955  |         1.183          |
|           tf_mixnet_l           | 128 | 0.9806 |  0.9091   |     0.7898     |   1.0562    |  1.1937  |         1.191          |
|            mixnet_l             | 128 | 0.9795 |  0.9053   |     0.7943     |   1.0634    |  1.183   |         1.1779         |
|             dpn107              | 32  | 0.9425 |  0.9346   |     0.7546     |   0.9966    |  1.1638  |         1.1762         |
|        gluon_xception65         | 32  | 1.001  |  0.9834   |     0.7547     |   1.0649    |  1.1602  |         1.1251         |
|            repvgg_a2            | 128 | 0.9434 |  0.9339   |     0.798      |   1.1317    |  1.1399  |         1.1566         |
|     swsl_resnext101_32x16d      | 32  | 1.0002 |  0.9812   |      0.81      |   1.0749    |  1.1328  |         1.0569         |
|            gernet_l             | 128 | 0.9466 |  0.9383   |     0.7683     |   1.1433    |  1.0676  |         1.0767         |
|        convmixer_768_32         | 32  | 0.9999 |  0.9982   |     0.9231     |   1.0533    |  1.0557  |         1.0507         |
|          convnext_base          | 64  | 0.9994 |  0.9949   |      0.0       |   1.2018    |  0.6659  |         0.6626         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs |  aot_nvfuser  |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+
|        adv_inception_v3         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          jx_nest_base           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|        sebotnet33ts_256         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           selecsls42b           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            tinynet_a            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        twins_pcpvt_base         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         visformer_small         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           res2next50            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |  fail_to_run  |     pass      |          pass          |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|      xcit_large_24_p8_224       | 2  | pass  |  fail_to_run  |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|         poolformer_m36          | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|           resnest101e           | 2  | pass  |     pass      |      pass      | fail_accuracy |     pass      |          pass          |
|          gmixer_24_224          | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|          gmlp_s16_224           | 2  | pass  |     pass      |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   | fail_accuracy |     pass      |          pass          |
|         coat_lite_mini          | 2  | pass  | fail_accuracy | fail_accuracy  | fail_accuracy |     pass      |          pass          |
|          botnet26t_256          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           rexnet_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            hrnet_w18            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          cspdarknet53           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|             dla102              | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|             dpn107              | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            gernet_l             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          ghostnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          inception_v3           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            lcnet_050            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            mixnet_l             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|         mobilenetv2_100         | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           mobilevit_s           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            nfnet_l0             | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            pit_b_224            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|          pnasnet5large          | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|           regnety_002           | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |     pass      |     pass      |          pass          |
|        eca_halonext26ts         | 2  | pass  |     pass      |      pass      |     pass      |  fail_to_run  |     fail_accuracy      |
|        gluon_xception65         | 2  | pass  |     pass      |      pass      |     pass      | fail_accuracy |     fail_accuracy      |
|            fbnetv3_b            | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+---------------+----------------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|        twins_pcpvt_base         | 64  | 2.8295 |  19.9737  |    31.9569     |   73.4381   | 437.2727 |        453.1626        |
|         coat_lite_mini          | 128 | 1.2392 |  7.1588   |    10.6151     |   32.947    | 418.2742 |        410.7454        |
|           mobilevit_s           | 64  | 1.8987 |  9.8143   |    19.6339     |   64.0554   | 338.6094 |        330.3378        |
|        sebotnet33ts_256         | 64  | 1.7391 |  7.9521   |    17.5542     |   68.8366   | 326.7606 |        326.3663        |
|        eca_halonext26ts         | 128 | 1.5305 |  6.6245   |    13.4204     |   66.3073   | 303.6995 |          nan           |
|       eca_botnext26ts_256       | 128 | 1.472  |   6.254   |    13.0608     |   62.6602   | 264.9251 |        255.2382        |
|      xcit_large_24_p8_224       |  5  | 3.3668 |    nan    |      nan       |     nan     | 199.5789 |        197.7456        |
|          botnet26t_256          | 128 | 1.4261 |  5.6805   |    12.7039     |   50.5016   | 192.9984 |        192.1813        |
|  swin_base_patch4_window7_224   | 64  | 3.0112 |  16.9517  |      nan       |   72.5605   | 185.2649 |        180.2708        |
|          jx_nest_base           | 32  | 1.8521 |  12.2298  |      nan       |   52.0403   | 173.585  |        173.3859        |
|          convnext_base          | 64  | 1.6052 |  9.4704   |      nan       |   36.1856   | 160.721  |        155.0981        |
|          cait_m36_384           |  4  | 3.5529 |  25.7848  |      nan       |   62.8823   | 149.5356 |        148.1344        |
|            hrnet_w18            | 128 | 6.912  |  42.085   |    73.7728     |   458.635   | 130.7424 |        123.9546        |
|           resnest101e           | 64  | 3.5049 |  22.7989  |    35.9505     |  108.9619   | 128.6373 |        119.2999        |
|         crossvit_9_240          | 128 | 1.7143 |  11.4764  |    17.0618     |   37.2511   | 125.1712 |        122.8947        |
|           volo_d1_224           | 64  | 1.3509 |  10.3183  |      nan       |   39.4754   | 106.0502 |        104.6853        |
|          pnasnet5large          | 16  | 5.0431 |  30.6895  |    52.5155     |  192.2498   | 104.7165 |        100.8943        |
|         visformer_small         | 128 | 1.012  |  5.2996   |     7.8662     |   30.9644   | 95.0057  |        93.4595         |
|            pit_b_224            | 64  | 1.1784 |  7.1844   |    10.8127     |   25.351    | 89.8326  |        89.2225         |
|          gmlp_s16_224           | 128 | 1.3138 |  9.9217   |      nan       |   22.6324   | 78.2899  |        76.1123         |
|        res2net101_26w_4s        | 64  | 3.3603 |  23.0953  |    36.8235     |  120.3506   | 68.8668  |        64.0399         |
|        tnt_s_patch16_224        | 128 | 1.8667 |  14.5415  |      nan       |   38.6474   | 65.3885  |        60.6268         |
|        res2net50_14w_8s         | 128 | 2.9853 |  20.3292  |    31.8737     |  139.1154   | 62.7041  |        58.2307         |
|          gmixer_24_224          | 128 | 1.5031 |  11.0912  |      nan       |   28.4307   | 60.5123  |        58.8985         |
|           convit_base           | 64  | 1.2706 |  8.2016   |      nan       |     nan     | 59.2515  |         57.867         |
|        gluon_xception65         | 32  | 2.3465 |  15.3302  |    23.0085     |   64.4876   | 54.7008  |        51.1978         |
|         poolformer_m36          | 64  | 1.9528 |  11.6525  |    17.9511     |     nan     | 51.8802  |        48.9452         |
|     swsl_resnext101_32x16d      | 32  | 1.9419 |  13.0524  |     19.494     |   53.4265   | 48.2901  |        45.2918         |
|             dpn107              | 32  |  4.17  |  18.515   |    53.4895     |  102.1895   | 48.0665  |        45.8321         |
|            fbnetv3_b            | 128 | 3.5692 |  14.6862  |    37.7741     |  101.1168   | 44.3318  |        41.8336         |
|      vit_base_patch16_224       | 64  | 1.0368 |  6.6764   |     8.9831     |   14.3711   | 44.3093  |        42.9548         |
| deit_base_distilled_patch16_224 | 64  | 1.068  |  6.3507   |     9.2566     |   15.1017   | 44.0566  |        43.5425         |
|          resmlp_12_224          | 128 | 0.7512 |  4.3433   |     7.979      |     nan     |  42.344  |        45.1063         |
|          mixer_b16_224          | 128 | 0.9015 |  5.0322   |     8.4324     |   16.6798   | 41.9174  |        40.6346         |
|           tf_mixnet_l           | 128 | 5.8789 |  15.9108  |    33.3269     |   87.9298   | 41.8465  |        38.5803         |
|          inception_v3           | 128 | 1.7317 |  11.9334  |    17.8615     |   99.7468   | 40.8376  |        39.0428         |
|       gluon_inception_v3        | 128 | 1.7562 |  12.1734  |    17.6723     |   99.8994   | 40.6634  |        38.1773         |
|        adv_inception_v3         | 128 | 1.7536 |  11.9695  |    17.8052     |   99.6723   | 40.4868  |        38.1699         |
|          ghostnet_100           | 128 | 3.1544 |  12.7088  |    18.0779     |   91.6621   | 39.6488  |         36.863         |
|      beit_base_patch16_224      | 64  | 1.3538 |  7.5436   |      nan       |   18.3732   | 39.4215  |        38.2418         |
|            mixnet_l             | 128 | 5.5244 |  15.4903  |    32.9421     |   87.3631   | 39.3911  |         37.456         |
|             dla102              | 128 | 1.9687 |  13.3249  |    20.2285     |   87.1116   | 39.2475  |        37.7842         |
|        convmixer_768_32         | 32  | 1.5356 |  8.8642   |    13.0816     |   18.3079   | 38.6408  |        36.5099         |
|           res2next50            | 128 | 1.8331 |  11.5784  |    17.2442     |   86.7869   |  36.482  |         33.143         |
|           dm_nfnet_f0           | 128 | 2.1941 |  9.5482   |      nan       |   39.4523   | 34.5198  |        32.6655         |
|           rexnet_100            | 128 | 2.0888 |  9.6586   |    21.6599     |  117.7439   |  32.163  |        31.0996         |
|            tinynet_a            | 128 | 2.2105 |  10.5963  |    24.7563     |   79.6911   | 31.7316  |        30.1822         |
|          cspdarknet53           | 64  | 2.3986 |  9.7328   |    23.8357     |   41.1186   | 28.5675  |        26.8411         |
|       tf_efficientnet_b0        | 128 | 1.9705 |  8.9001   |    20.3522     |   78.4259   | 27.4314  |        26.2041         |
|            nfnet_l0             | 128 | 1.9057 |  9.4882   |    13.6435     |   35.5309   | 26.9764  |        25.2746         |
|           fbnetc_100            | 128 | 2.1718 |  8.7703   |    21.6369     |   60.2288   | 26.2368  |        25.3805         |
|          spnasnet_100           | 128 | 2.1772 |  8.5425   |    21.2303     |   57.7374   | 26.0878  |        24.6668         |
|      mobilenetv3_large_100      | 128 | 1.7449 |  7.4376   |    16.7432     |   82.6857   | 24.4636  |        23.2897         |
|         mobilenetv2_100         | 128 | 1.7028 |  7.0597   |    16.6933     |   41.1163   |  22.69   |         21.276         |
|           mnasnet_100           | 128 | 1.7191 |  7.0945   |    16.3612     |   51.0751   | 22.6162  |        20.8213         |
|            gernet_l             | 128 | 2.0842 |  8.2421   |    19.7152     |   44.6438   | 22.1882  |        21.2112         |
|           regnety_002           | 128 | 1.7361 |   7.658   |    17.1152     |   56.7209   | 22.0035  |        20.9952         |
|            repvgg_a2            | 128 | 2.0742 |  7.9069   |    18.9615     |   62.6982   | 21.7179  |         20.452         |
|           selecsls42b           | 128 | 0.9022 |  5.3253   |     7.8206     |   50.9076   | 19.4225  |        18.3737         |
|            lcnet_050            | 128 | 1.1115 |  4.4334   |     8.8827     |   38.6244   | 15.6594  |        14.8776         |
|        ese_vovnet19b_dw         | 128 | 1.1125 |   4.219   |     8.3426     |   39.3669   | 15.3676  |        14.1028         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|            tinynet_a            | 128 | 0.9889 |  0.7884   |     0.2764     |   0.7887    |  1.3707  |         1.4015         |
|          gmixer_24_224          | 128 | 0.9926 |  0.9699   |      nan       |   0.9029    |  1.3139  |         1.3772         |
|          gmlp_s16_224           | 128 | 0.9937 |  0.9715   |      nan       |   0.9188    |  1.2841  |         1.2998         |
|       tf_efficientnet_b0        | 128 | 0.9882 |  0.7693   |     0.2666     |   0.8392    |  1.173   |         1.1918         |
|          pnasnet5large          | 16  | 1.0575 |  0.9913   |     0.3633     |   1.1722    |  1.1608  |         1.2789         |
|           mobilevit_s           | 64  | 0.9931 |  0.7669   |     0.2734     |   0.7848    |  1.1578  |         1.2186         |
|           rexnet_100            | 128 | 0.9885 |   0.785   |     0.285      |   0.8648    |  1.1475  |         1.1687         |
|       eca_botnext26ts_256       | 128 | 0.9886 |   0.77    |     0.2669     |    0.776    |  1.1068  |         1.2102         |
|        eca_halonext26ts         | 128 | 0.9886 |  0.7747   |     0.267      |   0.7762    |  1.1053  |          nan           |
|         poolformer_m36          | 64  | 0.9979 |  0.9432   |     0.3413     |     nan     |  1.1022  |         1.1162         |
|        tnt_s_patch16_224        | 128 | 0.9945 |  0.9729   |      nan       |   0.9418    |  1.0703  |         1.1492         |
|           resnest101e           | 64  | 0.995  |  0.9889   |     0.3473     |   0.9685    |  1.0556  |         1.0626         |
|           convit_base           | 64  | 0.9966 |  0.8516   |      nan       |     nan     |  1.0529  |         1.1534         |
|           dm_nfnet_f0           | 128 | 0.969  |   0.898   |      nan       |   0.9443    |  1.0336  |         1.124          |
|            nfnet_l0             | 128 | 0.9884 |  0.8173   |     0.2681     |   0.8142    |  1.0332  |         1.0762         |
|           volo_d1_224           | 64  | 0.9965 |  0.9475   |      nan       |   0.8587    |  1.0138  |         1.0718         |
|         mobilenetv2_100         | 128 | 0.9863 |  0.7642   |     0.3109     |   0.9129    |  1.0048  |         1.021          |
|      beit_base_patch16_224      | 64  | 0.9952 |  0.9327   |      nan       |   0.9298    |  1.0004  |         1.0447         |
|        convmixer_768_32         | 32  | 0.9972 |  0.9788   |     0.3455     |   0.9714    |  0.9746  |         0.9788         |
|            pit_b_224            | 64  | 0.999  |  0.8053   |     0.326      |   0.8179    |  0.9746  |         1.2067         |
|        twins_pcpvt_base         | 64  | 0.9945 |  0.9232   |     0.3403     |    0.802    |  0.9699  |         1.0818         |
|            fbnetv3_b            | 128 | 0.9872 |  0.7836   |     0.3151     |    0.79     |  0.9645  |         0.9776         |
|          ghostnet_100           | 128 | 0.9756 |   0.87    |     0.337      |   0.9026    |  0.9489  |         0.9832         |
|             dla102              | 128 | 0.9694 |   0.912   |     0.3363     |   0.9381    |  0.9431  |         0.9502         |
|         visformer_small         | 128 | 0.9899 |  0.9259   |     0.3469     |   0.8884    |  0.9382  |         1.0521         |
|      xcit_large_24_p8_224       |  5  | 0.9975 |    nan    |      nan       |     nan     |  0.9319  |         0.9931         |
|           tf_mixnet_l           | 128 | 0.991  |  0.8555   |     0.2875     |   0.8365    |  0.9314  |         1.0486         |
|          cait_m36_384           |  4  | 0.9998 |  0.9141   |      nan       |   0.9442    |  0.929   |         0.9775         |
|     swsl_resnext101_32x16d      | 32  | 0.9989 |   0.879   |     0.3676     |   0.8487    |  0.9113  |         0.9354         |
|          mixer_b16_224          | 128 | 0.992  |  0.9574   |     0.3472     |   0.7555    |  0.9089  |         0.9818         |
|             dpn107              | 32  | 0.997  |  0.9097   |     0.3531     |   0.8814    |  0.9069  |         0.9596         |
|            hrnet_w18            | 128 | 0.9914 |  0.9176   |     0.3348     |   0.8581    |  0.8969  |         0.938          |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9151   |     0.3336     |   0.8524    |  0.8964  |         0.9224         |
|      mobilenetv3_large_100      | 128 | 0.9772 |   0.84    |     0.3302     |   0.8641    |  0.8948  |         0.916          |
|           selecsls42b           | 128 | 0.9789 |   0.876   |     0.3528     |   0.8772    |  0.8927  |         0.9188         |
|        gluon_xception65         | 32  | 0.9955 |  0.8859   |     0.3349     |   0.8854    |  0.8924  |         0.8971         |
|      vit_base_patch16_224       | 64  | 0.9955 |  0.9342   |     0.3594     |   0.8801    |  0.8916  |         0.8968         |
| deit_base_distilled_patch16_224 | 64  | 0.9944 |  0.9332   |     0.359      |   0.8794    |  0.8911  |         0.8966         |
|        ese_vovnet19b_dw         | 128 | 0.9858 |  0.8566   |     0.3273     |   0.9146    |  0.8905  |         0.9028         |
|        adv_inception_v3         | 128 | 0.9824 |  0.8621   |     0.3343     |   0.8538    |  0.8845  |         0.8998         |
|       gluon_inception_v3        | 128 | 0.9824 |  0.8621   |     0.3343     |   0.8538    |  0.8845  |         0.8998         |
|          inception_v3           | 128 | 0.9824 |  0.8621   |     0.3343     |   0.8538    |  0.8845  |         0.8998         |
|        res2net50_14w_8s         | 128 | 0.9908 |  0.9072   |     0.3232     |   0.8299    |  0.876   |         0.9007         |
|           res2next50            | 128 | 0.9913 |   0.91    |     0.3202     |   0.8285    |  0.8697  |         0.8972         |
|            mixnet_l             | 128 | 0.9902 |  0.8441   |     0.2716     |   0.7737    |  0.8653  |         0.9722         |
|            gernet_l             | 128 | 0.9794 |  0.8503   |     0.3444     |   0.8158    |  0.8621  |         0.8897         |
|          spnasnet_100           | 128 | 0.9788 |  0.8801   |     0.3343     |   0.8371    |  0.8602  |         0.8784         |
|          cspdarknet53           | 64  | 0.9915 |  0.8405   |     0.3241     |   0.7908    |  0.8512  |         0.8583         |
|          botnet26t_256          | 128 | 0.9849 |   0.864   |     0.3308     |   0.7708    |  0.8503  |         0.898          |
|           mnasnet_100           | 128 | 0.9765 |  0.8701   |     0.3348     |   0.8252    |  0.8503  |         0.8698         |
|           fbnetc_100            | 128 |  0.98  |  0.8491   |     0.3306     |   0.7352    |  0.8387  |         0.8542         |
|            lcnet_050            | 128 | 0.9433 |  0.7566   |     0.3359     |   0.7559    |  0.8309  |         0.8769         |
|           regnety_002           | 128 | 0.9504 |  0.7948   |     0.3403     |   0.7515    |  0.8245  |         0.8627         |
|         crossvit_9_240          | 128 | 0.9854 |  0.8707   |     0.3347     |   0.8842    |  0.8174  |         1.0986         |
|          convnext_base          | 64  | 1.003  |  0.9263   |      nan       |   0.7349    |  0.8166  |         0.9866         |
|          resmlp_12_224          | 128 | 0.9827 |  0.9508   |     0.2624     |     nan     |  0.8092  |         0.8236         |
|         coat_lite_mini          | 128 | 1.0338 |  0.9202   |     0.3514     |   0.6593    |  0.8006  |         1.035          |
|            repvgg_a2            | 128 | 0.9767 |  0.7822   |     0.3407     |   0.6789    |  0.7905  |         0.8278         |
|  swin_base_patch4_window7_224   | 64  | 0.9966 |  0.9203   |      nan       |   0.8451    |  0.7566  |         0.9252         |
|        sebotnet33ts_256         | 64  | 0.9928 |  0.7073   |     0.3212     |   0.7354    |  0.7449  |         0.8293         |
|          jx_nest_base           | 32  | 0.9983 |  0.8927   |      nan       |    0.86     |  0.6708  |         0.8619         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/torchbench_amp.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 98%, 54/55 | 100%, 43/43 | 100%, 61/61 |
|       aot_eager        | 95%, 52/55 | 100%, 43/43 | 98%, 60/61  |
|     aot_cudagraphs     | 73%, 40/55 | 47%, 20/43  | 39%, 24/61  |
|      aot_nvfuser       | 58%, 32/55 |  2%, 1/43   | 89%, 54/61  |
|        inductor        | 87%, 48/55 | 93%, 40/43  | 95%, 58/61  |
| inductor_no_cudagraphs | 91%, 50/55 | 93%, 40/43  | 95%, 58/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.01x    |    1.00x    |
|       aot_eager        |   1.01x    |    1.00x    |    1.00x    |
|     aot_cudagraphs     |   1.09x    |    1.02x    |    1.00x    |
|      aot_nvfuser       |   1.13x    |    1.12x    |    1.11x    |
|        inductor        |   1.48x    |    1.28x    |    1.25x    |
| inductor_no_cudagraphs |   1.22x    |    1.21x    |    1.24x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    2.08    |    2.22     |    1.88     |
|       aot_eager        |    6.92    |    9.05     |    8.70     |
|     aot_cudagraphs     |    8.23    |    18.64    |    15.25    |
|      aot_nvfuser       |   20.32    |    9.60     |    50.01    |
|        inductor        |   62.17    |    52.98    |    73.89    |
| inductor_no_cudagraphs |   64.61    |    49.17    |    72.74    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.96x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.91x    |    0.88x    |
|     aot_cudagraphs     |   0.39x    |    0.36x    |    0.32x    |
|      aot_nvfuser       |   0.83x    |    1.08x    |    0.84x    |
|        inductor        |   0.82x    |    0.72x    |    0.97x    |
| inductor_no_cudagraphs |   0.94x    |    0.96x    |    1.02x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|            densenet121            |  4   | 1.0028 |  0.9993   |     2.3219     |    1.443    |  5.4438  |         1.3058         |
|         timm_efficientdet         |  1   | 0.9824 |  0.8845   |      0.0       |     0.0     |  4.2758  |         1.526          |
|       functorch_dp_cifar10        |  64  | 1.0024 |  0.9777   |     2.1532     |   1.1969    |  3.6923  |         1.2407         |
|      timm_vision_transformer      |  8   | 1.0068 |  0.9447   |     1.5339     |   1.3578    |  2.5716  |         1.4121         |
|                drq                |  1   | 1.0315 |  0.8503   |     1.3708     |   1.0638    |  2.4195  |         1.0737         |
|          resnext50_32x4d          |  8   | 1.0007 |   1.079   |     1.2092     |   1.3669    |  2.0959  |         1.2162         |
|        mobilenet_v3_large         |  32  | 1.0078 |  1.1087   |     1.0365     |   1.3781    |  1.9864  |         1.3795         |
|           BERT_pytorch            |  16  | 1.0104 |  0.8854   |      0.0       |     0.0     |  1.9168  |         1.9012         |
|             resnet18              |  16  | 1.006  |  1.1021   |     1.168      |   1.3958    |  1.8428  |         1.2045         |
|          pytorch_struct           | 200  | 0.9977 |  0.7381   |     0.8734     |   0.8906    |  1.827   |         1.1633         |
|           lennard_jones           | 1000 | 0.976  |  0.8293   |     1.0524     |   1.0142    |  1.818   |         0.9452         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9968 |  0.9377   |     1.2471     |   1.1785    |  1.7636  |         1.3013         |
|           squeezenet1_1           |  32  | 0.9979 |  0.9923   |     1.0527     |   1.1557    |  1.7406  |         1.2709         |
|             hf_Albert             |  8   | 1.0015 |  0.9976   |     0.752      |     0.0     |  1.6466  |         1.6414         |
|               dcgan               |  32  | 0.9829 |  1.0102   |     1.2585     |   1.1788    |  1.6306  |         1.0725         |
|            hf_T5_large            |  2   | 1.0248 |  0.9068   |      0.0       |     0.0     |  1.5833  |         1.5731         |
|        speech_transformer         |  32  | 1.0038 |  0.9068   |      0.0       |     0.0     |  1.5684  |         1.544          |
|        shufflenet_v2_x1_0         | 128  | 1.0005 |  1.0532   |     0.8062     |   1.1931    |   1.53   |         1.3689         |
|           timm_resnest            |  32  | 0.9996 |  1.0027   |     0.8044     |   1.1815    |  1.5191  |         1.4517         |
|            timm_nfnet             | 128  | 0.9993 |  0.9999   |      0.0       |   1.2122    |  1.4726  |         1.4222         |
|            mnasnet1_0             |  32  | 0.9993 |  1.0945   |     0.8568     |   1.2932    |  1.4577  |         1.2734         |
|    mobilenet_v2_quantized_qat     |  96  | 1.0016 |   0.978   |      0.0       |     0.0     |  1.4527  |         1.4479         |
|           mobilenet_v2            |  96  | 0.9998 |  1.0003   |     0.7313     |   1.0443    |  1.4287  |         1.4088         |
|              hf_GPT2              |  4   | 1.0046 |  0.9827   |     0.738      |     0.0     |  1.4239  |         1.4306         |
|         soft_actor_critic         | 256  | 0.9921 |  0.7715   |     1.1241     |   0.9985    |  1.4185  |         0.9565         |
|      resnet50_quantized_qat       |  32  | 1.0019 |  0.9619   |      0.0       |     0.0     |  1.401   |         1.3947         |
|           fastNLP_Bert            |  6   | 0.9997 |  0.9761   |     0.7528     |     0.0     |  1.3686  |         1.3445         |
|         timm_efficientnet         |  32  | 0.9551 |  0.8076   |     0.7031     |   1.0629    |  1.3353  |         1.2011         |
|          LearningToPaint          |  96  | 1.0048 |  1.0586   |     0.8687     |   1.2057    |  1.2627  |         1.2074         |
|           pytorch_unet            |  1   | 1.0001 |  0.9982   |     0.8464     |   1.0765    |  1.2042  |         1.1861         |
|             resnet50              |  32  | 0.9994 |  0.9937   |     0.7608     |   1.1612    |  1.204   |         1.1695         |
|            Super_SloMo            |  6   | 1.0003 |  0.9974   |     0.8669     |     0.0     |   1.18   |         1.1645         |
|              hf_Bart              |  4   | 1.0127 |  0.9757   |      0.0       |     0.0     |  1.1721  |         1.1653         |
|               vgg16               |  64  |  1.0   |   0.999   |     0.859      |   0.9973    |  1.1707  |         1.1652         |
|              alexnet              | 128  | 0.9991 |   0.998   |     0.8031     |   1.0004    |  1.163   |         1.1651         |
|              hf_Bert              |  4   | 1.0214 |   0.944   |     0.7306     |     0.0     |  1.1575  |         1.1396         |
|           hf_DistilBert           |  8   | 0.9999 |  0.9569   |     0.6872     |     0.0     |  1.1481  |         1.1546         |
|            timm_regnet            |  32  | 0.9653 |  0.9617   |     0.7795     |    1.096    |  1.1283  |         1.0941         |
|          pytorch_stargan          |  16  | 0.9997 |   0.983   |     0.866      |   0.9896    |  1.1189  |         1.0913         |
|        Background_Matting         |  4   | 1.0006 |  1.0218   |     0.866      |   1.0816    |  1.1153  |         1.1069         |
|            hf_Reformer            |  4   | 0.9961 |    0.0    |     0.9267     |     0.0     |  1.1095  |         1.1343         |
|            hf_BigBird             |  2   | 0.9915 |   0.939   |     0.9612     |     0.0     |  1.0921  |         1.0042         |
|              yolov3               |  16  |  1.0   |  0.9954   |     0.7893     |   1.1839    |  1.0795  |         1.0647         |
| attention_is_all_you_need_pytorch | 256  | 0.9999 |  0.9726   |      0.0       |     0.0     |  1.047   |         1.033          |
|   timm_vision_transformer_large   |  8   | 0.9982 |  0.9912   |      0.0       |   0.9805    |  1.044   |         1.0331         |
|            tts_angular            |  64  | 0.9937 |   0.964   |     0.9933     |   1.0231    |  1.0136  |         1.0218         |
|            timm_vovnet            |  32  | 0.9102 |  0.9045   |     0.7132     |   0.9774    |  1.0069  |         1.0176         |
|               dlrm                | 2048 | 1.0064 |  1.0734   |      0.0       |     0.0     |  1.0006  |          0.0           |
|              demucs               |  4   | 0.9997 |  0.9998   |     0.999      |   0.9999    |   1.0    |         1.0007         |
|      nvidia_deeprecommender       | 256  | 0.9994 |  0.9628   |     0.585      |    0.942    |  0.904   |         0.9643         |
|           hf_GPT2_large           |  4   | 1.0004 |  0.9805   |      0.0       |     0.0     |   0.0    |         1.3706         |
|               hf_T5               |  8   | 1.0002 |  0.9932   |      0.0       |     0.0     |   0.0    |         1.5515         |
|             tacotron2             |  64  | 0.981  |  0.8581   |      0.0       |     0.0     |   0.0    |         0.9362         |
|           hf_Longformer           |  2   | 0.9701 |  0.9013   |     0.8196     |     0.0     |   0.0    |          0.0           |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            hf_BigBird             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|      timm_vision_transformer      |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |          pass          |
|            Super_SloMo            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           fastNLP_Bert            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|             hf_Albert             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bert              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           hf_DistilBert           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            timm_regnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              yolov3               |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           timm_resnest            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        Background_Matting         |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             tacotron2             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |   fail_to_run    |          pass          |
|           hf_Longformer           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |         0.0000         |
|      resnet50_quantized_qat       |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |  fail_accuracy   |     fail_accuracy      |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |  fail_accuracy   |   fail_to_run    |   fail_to_run    |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|         timm_efficientdet         |  1   | 19.5344 |  38.4011  |      nan       |     nan     | 484.0577 |        488.767         |
|              yolov3               |  16  | 2.7711  |  8.6894   |    11.9084     |   43.4046   | 419.4861 |        419.8955        |
|            hf_T5_large            |  2   | 13.2998 |   41.15   |      nan       |     nan     | 205.3317 |        202.2279        |
|      timm_vision_transformer      |  8   | 0.7808  |  4.1474   |     5.8215     |   9.3655    |  153.43  |        160.5928        |
|        speech_transformer         |  32  | 1.5424  |  8.2938   |      nan       |     nan     | 152.3735 |        147.9389        |
|           timm_resnest            |  32  | 0.5383  |  2.6812   |     3.7424     |   35.1306   | 150.1654 |        145.0659        |
| attention_is_all_you_need_pytorch | 256  | 1.0734  |  7.1292   |      nan       |     nan     | 137.7387 |        139.7203        |
|   timm_vision_transformer_large   |  8   |  2.223  |  13.8751  |      nan       |   24.351    | 126.2802 |        123.9619        |
|          pytorch_stargan          |  16  | 0.3789  |  2.3643   |     3.1326     |   3.9188    | 107.0355 |        104.0851        |
|          pytorch_struct           | 200  | 0.2366  |  0.7827   |     1.3456     |   4.0715    |  99.505  |        98.1575         |
|           BERT_pytorch            |  16  | 1.4194  |   7.614   |      nan       |     nan     | 92.0393  |        92.0811         |
|           fastNLP_Bert            |  6   | 1.4306  |  6.6169   |    10.0451     |     nan     |  65.652  |         63.418         |
|              hf_GPT2              |  4   | 1.2488  |  6.1179   |     8.8738     |     nan     | 63.5447  |         63.521         |
|              hf_Bart              |  4   | 1.3924  |   8.089   |      nan       |     nan     | 49.9676  |        49.9717         |
|            densenet121            |  4   | 1.9897  |  13.3477  |    20.1678     |   88.3763   | 45.0957  |        43.7205         |
|        mobilenet_v3_large         |  32  | 0.8275  |  4.8204   |     6.7604     |   53.5764   | 44.9158  |        46.9735         |
|             hf_Albert             |  8   | 1.0066  |  5.8746   |     8.5532     |     nan     |  41.987  |         41.132         |
|            hf_BigBird             |  2   | 7.3861  |  13.5387  |     29.953     |     nan     | 41.2734  |        26.6352         |
|      resnet50_quantized_qat       |  32  |  1.061  |  9.0448   |      nan       |     nan     | 39.8902  |        40.3176         |
|              hf_Bert              |  4   |  1.312  |  6.2693   |     8.8293     |     nan     | 39.8395  |        38.7377         |
|            timm_regnet            |  32  |  2.173  |  8.4238   |    20.7651     |   47.6157   | 37.2439  |         35.16          |
|            hf_Reformer            |  4   | 2.3483  |    nan    |     9.1124     |     nan     |  36.065  |        30.7238         |
|         timm_efficientnet         |  32  | 1.6787  |   6.665   |    16.1146     |   52.4346   | 34.2419  |        34.4653         |
|            mnasnet1_0             |  32  | 0.7461  |  4.4921   |     6.4014     |   30.714    | 31.0909  |        30.7546         |
|             resnet50              |  32  | 0.7937  |  4.9477   |     6.925      |   32.2699   | 31.0875  |         29.832         |
|           hf_DistilBert           |  8   | 0.4278  |  3.0834   |     6.0696     |     nan     | 30.4362  |        29.5285         |
|          resnext50_32x4d          |  8   | 0.8239  |  4.9203   |     6.8365     |   28.5464   | 30.2931  |        30.0266         |
|            timm_vovnet            |  32  | 1.4222  |  4.5909   |     10.441     |   23.5649   | 30.0127  |        29.7463         |
|            timm_nfnet             | 128  | 1.8844  |  7.7171   |      nan       |   29.8502   | 29.8712  |        28.8763         |
|    mobilenet_v2_quantized_qat     |  96  | 1.1759  |  8.8754   |      nan       |     nan     | 27.0997  |        27.2946         |
|       functorch_dp_cifar10        |  64  | 0.3232  |  1.9699   |     2.8309     |   5.5366    | 26.1947  |        24.9937         |
|             resnet18              |  16  | 0.3858  |  1.8912   |     2.6752     |   17.5591   | 23.2902  |        20.4971         |
|        shufflenet_v2_x1_0         | 128  | 0.8656  |  5.4261   |     7.6883     |   26.8524   | 18.5748  |        17.9867         |
|            Super_SloMo            |  6   | 0.9695  |  5.0542   |     6.7627     |     nan     | 17.3419  |        16.4668         |
|        Background_Matting         |  4   | 0.6979  |  4.5367   |     6.7144     |   29.2894   | 16.7635  |        16.0163         |
|           mobilenet_v2            |  96  | 0.7343  |  4.4782   |     6.6781     |   37.1045   |  16.669  |        16.3002         |
|           pytorch_unet            |  1   | 0.4223  |  2.1063   |     2.9975     |   19.6418   |  8.2272  |         7.7305         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.3535  |   2.202   |     3.0539     |   3.8439    |  8.1719  |         8.0926         |
|          LearningToPaint          |  96  | 0.4124  |  1.9651   |     2.8324     |   23.8303   |  7.2019  |         6.8944         |
|           squeezenet1_1           |  32  | 0.2563  |  0.9557   |     1.3863     |   4.5328    |  4.0598  |         3.8616         |
|      nvidia_deeprecommender       | 256  | 0.1895  |  0.4298   |     0.6854     |   2.4393    |  4.0142  |         3.7143         |
|                drq                |  1   | 0.1402  |  0.4424   |     0.8198     |   3.4662    |  3.7694  |         3.1945         |
|               vgg16               |  64  | 0.1869  |  0.6441   |     1.0464     |   2.4609    |  3.6811  |         3.2422         |
|               dlrm                | 2048 | 0.4444  |  0.8198   |      nan       |     nan     |  3.4517  |          nan           |
|         soft_actor_critic         | 256  | 0.2031  |  0.3372   |     0.4948     |   1.5206    |  3.0611  |         2.6231         |
|              alexnet              | 128  | 0.1421  |  0.4161   |     0.6606     |   2.3558    |  2.9654  |         2.6911         |
|               dcgan               |  32  | 0.1641  |  0.4494   |     0.6683     |   3.7309    |  2.678   |         2.4053         |
|           lennard_jones           | 1000 | 0.1381  |   0.289   |     0.4429     |   1.0648    |  1.9631  |         1.736          |
|            tts_angular            |  64  | 0.2061  |  0.2786   |     0.3976     |   1.0162    |  1.8605  |         1.6749         |
|              demucs               |  4   | 0.2929  |  0.2934   |     0.2977     |   0.2969    |  0.2011  |         0.1967         |
|           hf_GPT2_large           |  4   | 4.9818  |  19.3363  |      nan       |     nan     |   nan    |        143.1625        |
|             tacotron2             |  64  | 16.7009 |  28.6252  |      nan       |     nan     |   nan    |        106.378         |
|               hf_T5               |  8   | 2.1787  |  9.4406   |      nan       |     nan     |   nan    |         44.804         |
|           hf_Longformer           |  2   | 5.7342  |  13.862   |    78.3703     |     nan     |   nan    |          nan           |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|      resnet50_quantized_qat       |  32  | 0.9967 |  0.9152   |      nan       |     nan     |  1.4314  |         1.4314         |
|    mobilenet_v2_quantized_qat     |  96  | 0.9957 |  0.8276   |      nan       |     nan     |  1.4036  |         1.4036         |
|         timm_efficientnet         |  32  | 0.9937 |  0.7666   |     0.2637     |   0.7837    |  1.3107  |         1.3377         |
|            Super_SloMo            |  6   | 1.0024 |  0.9527   |     0.363      |     nan     |  1.1858  |         1.1912         |
|         timm_efficientdet         |  1   | 1.0111 |   0.823   |      nan       |     nan     |  1.1165  |         1.1428         |
|           mobilenet_v2            |  96  | 0.9928 |  0.7624   |     0.3062     |   0.7638    |  1.1005  |         1.1105         |
|           squeezenet1_1           |  32  | 0.9749 |  0.8159   |     0.3374     |   0.9742    |  1.0823  |         1.1267         |
|            timm_nfnet             | 128  | 0.9358 |  0.8936   |      nan       |   0.9478    |  1.0219  |         1.0495         |
|              demucs               |  4   | 0.9886 |  0.9886   |     0.9886     |   0.9886    |  0.9886  |         0.9886         |
|            tts_angular            |  64  | 0.9884 |  0.9884   |     0.9829     |   0.9884    |  0.983   |         0.9884         |
|        shufflenet_v2_x1_0         | 128  | 0.9739 |  0.8944   |      0.35      |   0.8662    |  0.9791  |         1.0072         |
|              hf_GPT2              |  4   | 0.9548 |   0.906   |     0.3701     |     nan     |  0.9703  |         1.1094         |
|            timm_regnet            |  32  | 0.9985 |  0.8614   |     0.3327     |   0.8784    |  0.9284  |         0.9323         |
|        Background_Matting         |  4   | 0.9998 |  0.9492   |     0.3596     |   0.9749    |  0.9212  |         0.9238         |
|              yolov3               |  16  | 0.9957 |   0.844   |     0.334      |   0.8814    |  0.9151  |         0.919          |
|          pytorch_stargan          |  16  | 0.9975 |  1.0179   |     0.4129     |   1.0085    |  0.9023  |         0.9928         |
|           timm_resnest            |  32  | 0.9935 |  0.8793   |     0.3235     |   0.8021    |  0.8982  |         0.9697         |
|        speech_transformer         |  32  | 0.9982 |  0.9159   |      nan       |     nan     |  0.896   |         0.8996         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9986 |  0.9173   |     0.3919     |   0.9169    |  0.8848  |         0.9654         |
|             hf_Albert             |  8   | 0.9333 |  0.9333   |     0.2846     |     nan     |  0.8836  |         1.2215         |
|        mobilenet_v3_large         |  32  | 0.9878 |  0.8563   |     0.3277     |   0.8681    |  0.8829  |         0.8964         |
|            hf_T5_large            |  2   | 0.922  |  0.8673   |      nan       |     nan     |  0.8737  |         0.922          |
|   timm_vision_transformer_large   |  8   | 0.9997 |  0.8415   |      nan       |    0.801    |  0.8616  |         1.0285         |
|           pytorch_unet            |  1   | 0.9985 |  0.8521   |     0.3441     |   0.8496    |  0.859   |         0.8608         |
|             resnet50              |  32  | 0.9942 |  0.8719   |     0.3368     |    0.797    |  0.8564  |         0.8913         |
|            densenet121            |  4   | 0.9904 |  0.8812   |     0.3435     |   0.8551    |  0.8562  |         0.9307         |
|            mnasnet1_0             |  32  | 0.9869 |  0.8985   |     0.3331     |   0.8263    |  0.8531  |         0.8659         |
|              hf_Bart              |  4   | 0.9617 |  0.8598   |      nan       |     nan     |  0.8503  |         1.1284         |
|           fastNLP_Bert            |  6   | 1.0011 |  0.9152   |     0.3385     |     nan     |  0.8354  |         1.0952         |
|          resnext50_32x4d          |  8   | 0.9954 |  0.8671   |     0.3596     |   0.8203    |  0.8303  |         0.8352         |
|           BERT_pytorch            |  16  |  1.0   |  0.8995   |      nan       |     nan     |  0.825   |         1.0689         |
|            hf_BigBird             |  2   | 0.9604 |  0.9604   |     0.4301     |     nan     |  0.8211  |         1.0393         |
|               dcgan               |  32  | 0.9754 |  0.7634   |     0.4581     |   0.7634    |  0.767   |         0.7903         |
|                drq                |  1   | 0.987  |  0.8777   |     0.4252     |   0.8772    |  0.7632  |         0.8778         |
|         soft_actor_critic         | 256  | 0.9997 |  0.9637   |     0.4355     |   0.9555    |   0.75   |         0.9991         |
|      timm_vision_transformer      |  8   | 0.9943 |  0.8835   |     0.3305     |   0.8104    |  0.7478  |         0.8187         |
|              alexnet              | 128  | 0.9542 |   0.745   |     0.4163     |   0.7455    |  0.743   |         0.8332         |
|            timm_vovnet            |  32  | 0.9933 |  0.7603   |     0.3201     |   0.7741    |  0.7286  |         0.7339         |
|          LearningToPaint          |  96  | 0.9442 |  0.6896   |     0.3385     |   0.6503    |  0.7133  |         0.7462         |
|              hf_Bert              |  4   | 0.9683 |  0.9011   |     0.3525     |     nan     |  0.7048  |         0.985          |
|               dlrm                | 2048 | 0.7302 |  0.7305   |      nan       |     nan     |  0.7035  |          nan           |
|             resnet18              |  16  | 0.9831 |  0.7792   |     0.3593     |   0.6971    |  0.6902  |         0.7049         |
|           hf_DistilBert           |  8   | 0.9211 |  0.9047   |     0.3212     |     nan     |  0.6596  |         0.9466         |
|               vgg16               |  64  | 0.9944 |  0.6638   |     0.3214     |   0.6639    |  0.6471  |         0.6497         |
|           lennard_jones           | 1000 | 0.9995 |  0.9995   |     0.3711     |   1.0947    |  0.5646  |         0.9989         |
|      nvidia_deeprecommender       | 256  | 0.5598 |  0.5598   |     0.4624     |   0.5598    |  0.5598  |         0.5598         |
| attention_is_all_you_need_pytorch | 256  | 0.9476 |  0.9243   |      nan       |     nan     |  0.4682  |         0.6183         |
|          pytorch_struct           | 200  |  1.0   |  0.5079   |     0.4824     |   0.5079    |  0.4222  |         0.429          |
|       functorch_dp_cifar10        |  64  | 0.9961 |  0.8224   |     0.4456     |   0.8227    |  0.4056  |         0.4212         |
|            hf_Reformer            |  4   | 0.3011 |    nan    |     0.2397     |     nan     |  0.299   |         0.9882         |
|               hf_T5               |  8   | 0.9527 |  0.9415   |      nan       |     nan     |   nan    |         1.1507         |
|             tacotron2             |  64  | 0.9906 |   1.093   |      nan       |     nan     |   nan    |         1.1496         |
|           hf_GPT2_large           |  4   | 0.936  |  0.8833   |      nan       |     nan     |   nan    |         1.1258         |
|           hf_Longformer           |  2   | 0.9603 |  0.9603   |     0.2945     |     nan     |   nan    |          nan           |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|            YituTechConvBert             |  1  | 1.0285 |  0.9414   |      0.0       |     0.0     |  3.7345  |         1.5254         |
|                CamemBert                |  1  | 1.0493 |  0.9732   |     1.3251     |     0.0     |  2.3889  |         1.5405         |
|       MT5ForConditionalGeneration       |  8  | 1.0272 |  0.9263   |      0.0       |     0.0     |  2.2531  |         1.9848         |
|               DistillGPT2               |  1  | 1.0322 |  0.9458   |     1.0657     |     0.0     |  2.099   |         1.9009         |
|          MobileBertForMaskedLM          | 32  | 1.023  |  0.9232   |      0.0       |     0.0     |  1.9829  |         1.574          |
|               GoogleFnet                |  1  | 0.9985 |  0.8173   |     0.9815     |   1.1247    |  1.9188  |         1.1214         |
|      GPT2ForSequenceClassification      |  4  | 1.0002 |  0.9779   |      0.0       |     0.0     |  1.6662  |         1.6568         |
|       T5ForConditionalGeneration        |  4  | 1.0029 |  0.9667   |      0.0       |     0.0     |  1.4388  |         1.4275         |
|     M2M100ForConditionalGeneration      |  8  | 1.0412 |  0.8942   |     1.0013     |     0.0     |  1.4178  |         1.4085         |
|     MobileBertForQuestionAnswering      | 64  | 1.024  |  0.9187   |      0.0       |     0.0     |  1.4036  |         1.2789         |
|           ElectraForCausalLM            | 32  | 1.0004 |  0.9312   |      0.0       |     0.0     |  1.3702  |         1.4028         |
|       ElectraForQuestionAnswering       | 64  | 1.0005 |  0.9844   |      0.0       |     0.0     |  1.3541  |         1.3368         |
|       AlbertForQuestionAnswering        |  4  | 1.0002 |  1.0018   |      0.0       |     0.0     |  1.2567  |         1.2522         |
|            AlbertForMaskedLM            |  4  | 0.9993 |  0.9996   |      0.0       |     0.0     |   1.25   |         1.2519         |
|    LayoutLMForSequenceClassification    | 16  | 1.0001 |  0.9892   |     0.7379     |     0.0     |  1.2473  |         1.2318         |
|                 T5Small                 |  1  | 1.0191 |  0.9543   |      0.0       |     0.0     |  1.2442  |         1.2308         |
|     PLBartForConditionalGeneration      | 16  | 1.0124 |  0.9613   |      0.0       |     0.0     |  1.1874  |         1.188          |
|             OPTForCausalLM              | 32  | 1.0037 |   0.932   |      0.0       |     0.0     |  1.1825  |         1.1983         |
|             XGLMForCausalLM             |  8  | 1.0128 |  0.9394   |      0.0       |     0.0     |  1.1706  |         1.1753         |
|           LayoutLMForMaskedLM           | 16  | 1.0002 |   0.971   |      0.0       |     0.0     |  1.1633  |         1.1716         |
|     DistilBertForQuestionAnswering      | 64  | 0.9997 |   0.985   |     0.7131     |     0.0     |  1.1444  |         1.1262         |
|           RobertaForCausalLM            | 64  | 1.0004 |  0.9637   |     0.7465     |     0.0     |  1.1133  |         1.1212         |
|         Speech2Text2ForCausalLM         | 128 | 0.9989 |  0.9259   |     0.6593     |     0.0     |   1.11   |         1.1484         |
|                 BigBird                 |  1  | 0.9894 |   0.937   |     0.991      |     0.0     |  1.1023  |         1.0034         |
|             BartForCausalLM             |  4  | 1.0007 |  0.9668   |      0.0       |     0.0     |  1.0962  |         1.1067         |
|      BartForConditionalGeneration       |  2  | 1.0009 |  0.9887   |      0.0       |     0.0     |  1.0962  |         1.0896         |
|    MegatronBertForQuestionAnswering     | 16  | 1.038  |  1.0104   |     0.7572     |     0.0     |  1.0947  |         1.0716         |
|      MBartForConditionalGeneration      | 16  | 1.0102 |  0.9766   |      0.0       |     0.0     |  1.0887  |         1.0775         |
|           DebertaForMaskedLM            |  4  | 0.9321 |  0.8111   |     0.7317     |     0.0     |  1.0885  |         1.0732         |
|         MegatronBertForCausalLM         | 16  | 1.0332 |  1.0027   |     0.7578     |     0.0     |  1.087   |         1.0785         |
|     PegasusForConditionalGeneration     | 16  | 1.0101 |  0.9819   |     0.7569     |     0.0     |  1.0857  |         1.0825         |
|        BertForQuestionAnswering         | 128 | 0.9997 |  0.9882   |      0.0       |     0.0     |  1.0722  |         1.0661         |
|       RobertaForQuestionAnswering       | 128 | 1.0002 |  0.9942   |      0.0       |     0.0     |  1.0696  |         1.0709         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0005 |  0.9265   |      0.0       |     0.0     |  1.0628  |         1.0696         |
|       DebertaForQuestionAnswering       |  8  | 0.9976 |  0.9917   |     0.6821     |     0.0     |  1.0623  |         1.2025         |
|          DistilBertForMaskedLM          | 64  |  1.0   |  0.9519   |     0.7122     |     0.0     |  1.0362  |         1.0546         |
|             BertForMaskedLM             | 64  | 1.0003 |  0.9524   |     0.7302     |     0.0     |  1.0338  |         1.0381         |
|            PLBartForCausalLM            | 32  | 1.0055 |  0.9348   |     0.7321     |     0.0     |  1.0224  |         1.0494         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0022 |  0.9105   |     0.6827     |     0.0     |  1.0131  |         1.0345         |
|            TrOCRForCausalLM             | 32  | 1.0017 |  0.9556   |      0.0       |     0.0     |  0.9981  |         1.0096         |
|            MBartForCausalLM             | 32  | 1.0013 |  0.9555   |      0.0       |     0.0     |  0.9967  |         1.0069         |
|           PegasusForCausalLM            | 32  | 0.9998 |   0.953   |     0.7325     |     0.0     |  0.9888  |         1.0008         |
|          AllenaiLongformerBase          |  1  | 0.953  |  0.7915   |     0.7884     |     0.0     |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------------+----+-------+-----------+----------------+-------------+-------------+------------------------+
|                  name                   | bs | eager | aot_eager | aot_cudagraphs | aot_nvfuser |  inductor   | inductor_no_cudagraphs |
+-----------------------------------------+----+-------+-----------+----------------+-------------+-------------+------------------------+
|               GoogleFnet                | 1  | pass  |   pass    |      pass      |    pass     |    pass     |          pass          |
|       MT5ForConditionalGeneration       | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|         Speech2Text2ForCausalLM         | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|            AlbertForMaskedLM            | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       AlbertForQuestionAnswering        | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             BartForCausalLM             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      BartForConditionalGeneration       | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      GPT2ForSequenceClassification      | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            MBartForCausalLM             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|          MobileBertForMaskedLM          | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|           RobertaForCausalLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|     MobileBertForQuestionAnswering      | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             OPTForCausalLM              | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       T5ForConditionalGeneration        | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|                 T5Small                 | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            TrOCRForCausalLM             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             XGLMForCausalLM             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            XLNetLMHeadModel             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            YituTechConvBert             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             BertForMaskedLM             | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       RobertaForQuestionAnswering       | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|     PegasusForConditionalGeneration     | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|               DistillGPT2               | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|        BertForQuestionAnswering         | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|                 BigBird                 | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       BlenderbotSmallForCausalLM        | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|                CamemBert                | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|           DebertaForMaskedLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       DebertaForQuestionAnswering       | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|          DistilBertForMaskedLM          | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|           PegasusForCausalLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|     DistilBertForQuestionAnswering      | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|           ElectraForCausalLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       ElectraForQuestionAnswering       | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|           LayoutLMForMaskedLM           | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|    LayoutLMForSequenceClassification    | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|     M2M100ForConditionalGeneration      | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|         MegatronBertForCausalLM         | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|    MegatronBertForQuestionAnswering     | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|            PLBartForCausalLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|          AllenaiLongformerBase          | 1  | pass  |   pass    |      pass      | fail_to_run | fail_to_run |      fail_to_run       |
|      MBartForConditionalGeneration      | 1  | pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
|     PLBartForConditionalGeneration      | 1  | pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
+-----------------------------------------+----+-------+-----------+----------------+-------------+-------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|             XGLMForCausalLM             |  8  | 2.2364 |  12.2125  |      nan       |     nan     | 203.4086 |        201.0863        |
|           DebertaForMaskedLM            |  4  | 4.684  |  11.0814  |    44.7781     |     nan     | 163.7151 |        106.9608        |
|       DebertaForQuestionAnswering       |  8  | 4.5483 |  11.6349  |     43.993     |     nan     | 152.0741 |        118.2059        |
|     M2M100ForConditionalGeneration      |  8  | 2.7543 |  15.4794  |     23.643     |     nan     | 128.0751 |        124.2115        |
|            YituTechConvBert             |  1  | 2.0946 |  9.5284   |      nan       |     nan     | 115.4649 |        119.3641        |
|       MT5ForConditionalGeneration       |  8  | 3.4744 |  13.6659  |      nan       |     nan     | 90.4534  |        91.1223         |
|          MobileBertForMaskedLM          | 32  | 7.7855 |  27.1609  |      nan       |     nan     | 88.9601  |        85.7795         |
|     MobileBertForQuestionAnswering      | 64  | 7.9327 |  27.5186  |      nan       |     nan     | 74.7874  |         71.876         |
|         MegatronBertForCausalLM         | 16  | 3.0219 |  12.5327  |    19.6699     |     nan     | 61.5191  |        59.8845         |
|    MegatronBertForQuestionAnswering     | 16  | 3.0691 |  13.2977  |    19.1034     |     nan     | 60.2609  |        58.2808         |
|    LayoutLMForSequenceClassification    | 16  | 1.6734 |  6.6917   |    10.1343     |     nan     | 59.7267  |         60.187         |
|       T5ForConditionalGeneration        |  4  | 2.1399 |  8.8895   |      nan       |     nan     | 58.3394  |        57.0848         |
|     PegasusForConditionalGeneration     | 16  | 2.6227 |  14.7158  |    24.2283     |     nan     | 58.1897  |        54.3056         |
|      BartForConditionalGeneration       |  2  | 2.8248 |  15.0065  |      nan       |     nan     | 57.0652  |        54.7753         |
|                 T5Small                 |  1  | 2.1902 |  8.9903   |      nan       |     nan     | 55.4364  |        53.2137         |
|      MBartForConditionalGeneration      | 16  | 2.7868 |  15.512   |      nan       |     nan     | 54.3119  |        53.1455         |
|     PLBartForConditionalGeneration      | 16  | 1.3887 |   8.298   |      nan       |     nan     | 47.5246  |        46.3964         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.7139 |  10.0168  |      nan       |     nan     | 43.6075  |        41.5748         |
|                 BigBird                 |  1  | 7.296  |  13.5333  |    29.6711     |     nan     | 40.7238  |        26.8699         |
|           ElectraForCausalLM            | 32  | 1.2891 |  6.2441   |      nan       |     nan     | 40.6712  |         39.969         |
|               DistillGPT2               |  1  | 0.6422 |  3.1221   |     4.4918     |     nan     | 33.8479  |        32.6814         |
|           LayoutLMForMaskedLM           | 16  | 1.6131 |  6.6316   |      nan       |     nan     | 32.8126  |        32.5964         |
|             BertForMaskedLM             | 64  | 1.2973 |  6.3901   |     9.4361     |     nan     |  32.777  |        31.6779         |
|       ElectraForQuestionAnswering       | 64  | 1.3222 |  6.4111   |      nan       |     nan     | 32.5117  |        31.4854         |
|      GPT2ForSequenceClassification      |  4  | 1.2751 |  6.1953   |      nan       |     nan     | 32.0765  |        31.1399         |
|           RobertaForCausalLM            | 64  | 1.3104 |  6.1902   |     9.2915     |     nan     | 28.0396  |        27.4422         |
|        BertForQuestionAnswering         | 128 | 1.3166 |  6.2802   |      nan       |     nan     | 27.7294  |        27.1936         |
|           PegasusForCausalLM            | 32  | 1.0161 |   5.707   |     8.775      |     nan     | 27.1087  |        25.1376         |
|            MBartForCausalLM             | 32  | 0.9522 |  5.5767   |      nan       |     nan     | 25.4243  |        24.6154         |
|       RobertaForQuestionAnswering       | 128 | 1.3205 |   6.387   |      nan       |     nan     | 24.5494  |        23.8515         |
|            TrOCRForCausalLM             | 32  | 0.9241 |  5.5701   |      nan       |     nan     | 24.4333  |        24.1797         |
|             BartForCausalLM             |  4  | 1.0079 |  5.6176   |      nan       |     nan     | 24.3593  |        23.6588         |
|            AlbertForMaskedLM            |  4  | 1.1157 |  5.8703   |      nan       |     nan     | 23.8611  |        23.0601         |
|               GoogleFnet                |  1  | 0.7904 |  3.3495   |    10.4595     |   9.6049    | 23.8114  |        16.1369         |
|       BlenderbotSmallForCausalLM        | 64  | 0.6439 |  3.7467   |     5.6889     |     nan     |  23.625  |        22.6972         |
|          DistilBertForMaskedLM          | 64  | 0.4729 |  2.9552   |     5.8879     |     nan     | 23.0127  |         22.634         |
|       AlbertForQuestionAnswering        |  4  | 1.1461 |  5.9483   |      nan       |     nan     | 22.7287  |        21.5179         |
|             OPTForCausalLM              | 32  | 1.0353 |   5.881   |      nan       |     nan     | 21.8562  |        20.7457         |
|     DistilBertForQuestionAnswering      | 64  | 0.4816 |  3.0171   |     5.9235     |     nan     | 21.8186  |        22.1039         |
|                CamemBert                |  1  |  1.38  |  6.1479   |     8.5874     |     nan     | 21.7413  |        21.2151         |
|         Speech2Text2ForCausalLM         | 128 | 0.577  |  2.9045   |     4.6098     |     nan     | 19.6271  |         18.24          |
|            PLBartForCausalLM            | 32  | 0.4938 |  2.9552   |     4.3734     |     nan     | 18.8954  |        18.2071         |
|          AllenaiLongformerBase          |  1  | 5.9078 |  14.4262  |    80.0409     |     nan     |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|      GPT2ForSequenceClassification      |  4  | 0.9343 |  0.9093   |      nan       |     nan     |  1.0596  |         1.1223         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.9425   |      nan       |     nan     |  0.8646  |         1.4039         |
|                 T5Small                 |  1  |  1.0   |  0.9155   |      nan       |     nan     |  0.8564  |         1.0758         |
|     PegasusForConditionalGeneration     | 16  | 0.9985 |  0.9629   |     0.3704     |     nan     |  0.8436  |         1.0204         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.9255   |      nan       |     nan     |  0.842   |         1.3737         |
|                 BigBird                 |  1  | 0.999  |  0.9542   |     0.4215     |     nan     |  0.8224  |         1.0108         |
|       T5ForConditionalGeneration        |  4  |  1.0   |  0.9597   |      nan       |     nan     |  0.8215  |         1.1049         |
|               DistillGPT2               |  1  | 0.9984 |  0.8218   |     0.3795     |     nan     |  0.8173  |         0.9383         |
|             XGLMForCausalLM             |  8  | 0.9848 |  0.9137   |      nan       |     nan     |  0.8157  |         0.9642         |
|            YituTechConvBert             |  1  | 0.9858 |  0.8198   |      nan       |     nan     |  0.808   |         0.8738         |
|      BartForConditionalGeneration       |  2  |  1.0   |   0.893   |      nan       |     nan     |  0.7817  |         0.9515         |
|           PegasusForCausalLM            | 32  | 0.9593 |  0.9232   |     0.3909     |     nan     |  0.7774  |         0.9692         |
|     M2M100ForConditionalGeneration      |  8  | 1.007  |  0.9507   |     0.3799     |     nan     |  0.7712  |         1.016          |
|               GoogleFnet                |  1  | 0.9983 |  0.9453   |     0.3715     |   1.0813    |  0.7698  |         0.9373         |
|       MT5ForConditionalGeneration       |  8  | 1.0034 |  0.8861   |      nan       |     nan     |  0.7623  |         0.9396         |
|    MegatronBertForQuestionAnswering     | 16  |  1.0   |  0.8671   |     0.3483     |     nan     |  0.7528  |         0.9646         |
|                CamemBert                |  1  | 0.998  |  0.8252   |     0.3614     |     nan     |  0.7492  |         0.9186         |
|     PLBartForConditionalGeneration      | 16  |  1.0   |  0.8743   |      nan       |     nan     |  0.7397  |         0.9638         |
|            PLBartForCausalLM            | 32  | 0.9999 |   0.861   |     0.3948     |     nan     |  0.7381  |         0.9055         |
|      MBartForConditionalGeneration      | 16  |  1.0   |  0.8583   |      nan       |     nan     |  0.7209  |         0.9059         |
|    LayoutLMForSequenceClassification    | 16  |  1.0   |  0.9348   |     0.3324     |     nan     |  0.7189  |         1.0246         |
|         MegatronBertForCausalLM         | 16  | 0.9995 |  0.8826   |     0.352      |     nan     |  0.7161  |         0.9248         |
|             BartForCausalLM             |  4  |  1.0   |  0.9121   |      nan       |     nan     |  0.7149  |         0.9466         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8401   |     0.3879     |     nan     |  0.7147  |         0.8647         |
|       ElectraForQuestionAnswering       | 64  |  1.0   |  0.9524   |      nan       |     nan     |  0.7054  |         1.0298         |
|     DistilBertForQuestionAnswering      | 64  |  1.0   |  0.9373   |     0.3178     |     nan     |  0.6981  |         0.9303         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8975   |      nan       |     nan     |  0.6977  |         0.946          |
|           LayoutLMForMaskedLM           | 16  |  1.0   |  0.9409   |      nan       |     nan     |  0.695   |         0.9772         |
|            MBartForCausalLM             | 32  | 0.9999 |   0.89    |      nan       |     nan     |  0.6836  |         0.8978         |
|            TrOCRForCausalLM             | 32  | 0.9999 |  0.8898   |      nan       |     nan     |  0.6827  |         0.8876         |
|         Speech2Text2ForCausalLM         | 128 | 0.9552 |  0.8765   |     0.3524     |     nan     |  0.6775  |         0.8801         |
|             OPTForCausalLM              | 32  | 0.9982 |  0.8655   |      nan       |     nan     |  0.6761  |         0.8847         |
|           ElectraForCausalLM            | 32  | 0.9994 |   0.883   |      nan       |     nan     |  0.6731  |         0.905          |
|          DistilBertForMaskedLM          | 64  |  1.0   |  0.8899   |     0.3665     |     nan     |  0.6531  |         0.9124         |
|             BertForMaskedLM             | 64  |  1.0   |  0.9219   |     0.3646     |     nan     |  0.6385  |         0.8993         |
|           RobertaForCausalLM            | 64  | 0.9986 |  0.9206   |     0.3641     |     nan     |  0.6375  |         0.8975         |
|       RobertaForQuestionAnswering       | 128 |  1.0   |   0.968   |      nan       |     nan     |  0.6329  |         0.8939         |
|        BertForQuestionAnswering         | 128 |  1.0   |   0.968   |      nan       |     nan     |  0.6329  |         0.8939         |
|          MobileBertForMaskedLM          | 32  | 0.9998 |  0.9103   |      nan       |     nan     |  0.5256  |         0.7111         |
|     MobileBertForQuestionAnswering      | 64  |  1.0   |   0.984   |      nan       |     nan     |  0.4536  |         0.5968         |
|           DebertaForMaskedLM            |  4  |  1.0   |  0.9851   |     0.3553     |     nan     |  0.4267  |         1.0347         |
|       DebertaForQuestionAnswering       |  8  | 0.9816 |   1.063   |     0.3072     |     nan     |  0.3264  |         1.1588         |
|          AllenaiLongformerBase          |  1  | 0.9981 |  0.9515   |     0.3209     |     nan     |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|          ghostnet_100           | 128 | 0.9992 |  0.9956   |     0.8421     |   1.2485    |  1.8144  |         1.7733         |
|            lcnet_050            | 128 | 0.9568 |  0.9489   |     0.7675     |   1.4962    |  1.6425  |         1.6316         |
|         coat_lite_mini          | 128 |  1.0   |    1.0    |     0.8447     |   1.0566    |  1.6056  |         1.5895         |
|           regnety_002           | 128 | 0.9778 |  0.9844   |     0.8615     |   1.3561    |  1.4813  |         1.3447         |
|           dm_nfnet_f0           | 128 |  1.0   |  1.0003   |      0.0       |   1.2124    |  1.4725  |         1.422          |
|      xcit_large_24_p8_224       |  5  | 1.003  |  1.0032   |      0.0       |     0.0     |  1.4529  |         1.4094         |
|            hrnet_w18            | 128 | 0.9999 |  0.9985   |      0.0       |   1.3201    |  1.418   |         1.3775         |
|           volo_d1_224           | 64  | 0.9999 |  0.9959   |      0.0       |   1.1295    |  1.3859  |         1.3634         |
|             dla102              | 128 | 1.0002 |  1.0008   |      0.0       |   1.2853    |  1.3821  |         1.3693         |
|            nfnet_l0             | 128 | 0.9997 |  0.7891   |      0.0       |   1.0518    |  1.3733  |         1.3288         |
|        res2net50_14w_8s         | 128 | 0.9999 |    1.0    |      0.0       |   1.2307    |  1.3564  |         1.3208         |
|         mobilenetv2_100         | 128 | 0.9662 |  0.9648   |     0.7065     |   1.0145    |  1.3373  |         1.3526         |
|      mobilenetv3_large_100      | 128 | 0.9664 |  0.9632   |     0.7654     |   1.1624    |  1.3356  |         1.3413         |
|         crossvit_9_240          | 128 | 0.9999 |  0.9988   |      0.0       |   1.0243    |  1.3305  |         1.3051         |
|        adv_inception_v3         | 128 |  1.0   |   0.999   |      0.0       |   1.1253    |  1.328   |         1.3083         |
|       gluon_inception_v3        | 128 |  1.0   |  0.9988   |      0.0       |   1.1224    |  1.3249  |         1.3075         |
|          inception_v3           | 128 |  1.0   |   0.999   |      0.0       |   1.1257    |  1.3244  |         1.3076         |
|           res2next50            | 128 |  1.0   |  1.0009   |      0.0       |    1.166    |  1.3121  |         1.2748         |
|           resnest101e           | 64  | 1.0001 |  1.0035   |      0.0       |   1.1963    |  1.3115  |         1.2714         |
|          gmixer_24_224          | 128 | 0.9999 |  0.8348   |      0.0       |    0.98     |  1.2974  |         1.2696         |
|            fbnetv3_b            | 128 | 0.9642 |  0.9614   |     0.7623     |   1.1326    |  1.283   |         1.2951         |
|          botnet26t_256          | 128 | 0.9851 |  0.9857   |     0.7892     |   1.2271    |  1.2742  |         1.2801         |
|          jx_nest_base           | 32  | 0.9998 |  0.9926   |      0.0       |    1.217    |  1.2725  |         1.2481         |
|        sebotnet33ts_256         | 64  | 0.9753 |  0.8072   |      0.0       |   1.0528    |  1.2706  |         1.2762         |
|       eca_botnext26ts_256       | 128 | 0.9867 |  0.7721   |      0.0       |   1.0301    |  1.2706  |         1.2477         |
|           selecsls42b           | 128 | 0.9998 |  0.9991   |     0.8157     |   1.2083    |  1.2671  |         1.2514         |
|       tf_efficientnet_b0        | 128 | 0.9776 |  0.7843   |      0.0       |   0.9848    |  1.2613  |         1.2686         |
|           mnasnet_100           | 128 | 0.9663 |  0.9639   |     0.7855     |   1.1575    |  1.2598  |         1.2787         |
|        eca_halonext26ts         | 128 | 0.9877 |  0.7787   |      0.0       |   1.0289    |  1.2502  |         1.2494         |
|           fbnetc_100            | 128 | 0.967  |  0.9622   |     0.7908     |   1.1879    |  1.2497  |         1.2635         |
|        ese_vovnet19b_dw         | 128 | 0.9795 |  0.9777   |     0.7445     |   1.1452    |  1.2404  |         1.2461         |
|          spnasnet_100           | 128 | 0.9605 |  0.9573   |     0.7734     |   1.1366    |  1.2375  |         1.2543         |
|          cspdarknet53           | 64  | 0.9581 |  0.9526   |     0.7322     |   1.1835    |  1.2287  |         1.2391         |
|        res2net101_26w_4s        | 64  | 0.9997 |  0.9972   |     0.7705     |   1.1739    |  1.2283  |         1.1885         |
|           convit_base           | 64  | 0.9998 |  0.9992   |      0.0       |    1.195    |  1.2216  |         1.2164         |
|            pit_b_224            | 64  | 1.0001 |  0.9996   |      0.0       |    1.055    |  1.221   |         1.211          |
|          gmlp_s16_224           | 128 |  1.0   |  0.9994   |      0.0       |   0.9989    |  1.2164  |         1.2053         |
|           rexnet_100            | 128 | 0.9723 |  0.8169   |      0.0       |   0.9835    |  1.2142  |         1.2193         |
|          pnasnet5large          | 16  | 0.9998 |  0.9985   |      0.0       |   1.0838    |  1.2112  |         1.1932         |
|            tinynet_a            | 128 | 0.9659 |  0.7757   |     0.6205     |   0.9713    |  1.1925  |         1.1949         |
|          cait_m36_384           |  4  | 0.9998 |    0.0    |      0.0       |     0.0     |  1.1826  |         1.158          |
|           tf_mixnet_l           | 128 | 0.9853 |  0.8897   |      0.0       |   1.0177    |  1.173   |         1.1697         |
|             dpn107              | 32  | 0.958  |  0.9367   |     0.7817     |   1.0288    |  1.1726  |         1.202          |
|           mobilevit_s           | 64  | 0.9792 |   0.762   |      0.0       |   0.9468    |  1.1702  |         1.1666         |
|            repvgg_a2            | 128 | 0.9641 |  0.9623   |     0.8288     |   1.1224    |  1.1692  |         1.1652         |
|         poolformer_m36          | 64  | 0.9998 |  0.9993   |      0.0       |     0.0     |  1.1661  |         1.1475         |
|            mixnet_l             | 128 | 0.9849 |  0.8858   |      0.0       |   1.0185    |  1.1534  |         1.1505         |
|        twins_pcpvt_base         | 64  | 1.0001 |  0.9974   |      0.75      |   1.0624    |  1.148   |         1.1172         |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9785   |      0.0       |   0.9932    |  1.1469  |         1.1322         |
|          convnext_base          | 64  | 0.9999 |  0.9988   |      0.0       |   1.0441    |  1.1157  |         1.1262         |
|      beit_base_patch16_224      | 64  | 0.9998 |  0.9801   |      0.0       |   0.9504    |  1.1141  |         1.1053         |
|     swsl_resnext101_32x16d      | 32  | 1.0001 |  0.9988   |      0.0       |   1.1071    |  1.1068  |         1.0712         |
| deit_base_distilled_patch16_224 | 64  |  1.0   |  0.9995   |     0.7673     |   1.0156    |  1.0955  |         1.0834         |
|        gluon_xception65         | 32  | 0.9998 |  0.9975   |      0.0       |   1.0403    |  1.0871  |         1.0759         |
|      vit_base_patch16_224       | 64  | 1.0002 |   0.999   |     0.7662     |   0.9763    |  1.0855  |         1.0734         |
|          mixer_b16_224          | 128 | 1.0006 |  1.0001   |      0.0       |   0.9771    |  1.0808  |         1.0736         |
|        convmixer_768_32         | 32  | 0.9999 |  1.0002   |      0.0       |   1.0615    |  1.0783  |         1.0744         |
|            gernet_l             | 128 | 0.9744 |  0.9723   |     0.8239     |   1.0992    |  1.075   |         1.0704         |
|         visformer_small         | 128 | 1.0001 |  1.0022   |     0.797      |   1.0217    |  1.0495  |         1.0162         |
|          resmlp_12_224          | 128 | 0.9999 |   1.001   |     0.6956     |     0.0     |  0.9499  |         0.9719         |
|        tnt_s_patch16_224        | 128 |  1.0   |  0.9992   |      0.0       |   1.6263    |   0.0    |         1.5436         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+-------------+----------------+---------------+---------------+------------------------+
|              name               | bs | eager |  aot_eager  | aot_cudagraphs |  aot_nvfuser  |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+-------------+----------------+---------------+---------------+------------------------+
|        adv_inception_v3         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          botnet26t_256          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        sebotnet33ts_256         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           selecsls42b           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          spnasnet_100           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            tinynet_a            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        twins_pcpvt_base         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|         visformer_small         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|      beit_base_patch16_224      | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|          convnext_base          | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|          gmixer_24_224          | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|          gmlp_s16_224           | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|          jx_nest_base           | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|           volo_d1_224           | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 2  | pass  |    pass     |      pass      |  fail_to_run  |     pass      |          pass          |
|           convit_base           | 2  | pass  |    pass     |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|      xcit_large_24_p8_224       | 2  | pass  |    pass     |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|          cait_m36_384           | 2  | pass  | fail_to_run |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|        gluon_xception65         | 2  | pass  |    pass     |      pass      | fail_accuracy |     pass      |          pass          |
|         poolformer_m36          | 2  | pass  |    pass     |      pass      | fail_accuracy |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 2  | pass  |    pass     |      pass      |     pass      |     pass      |     fail_accuracy      |
|           rexnet_100            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           res2next50            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|         coat_lite_mini          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          cspdarknet53           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|             dla102              | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|             dpn107              | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        eca_halonext26ts         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           fbnetc_100            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            gernet_l             | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          ghostnet_100           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            hrnet_w18            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          inception_v3           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            lcnet_050            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          mixer_b16_224          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            mixnet_l             | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           mnasnet_100           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|         mobilenetv2_100         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           mobilevit_s           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            nfnet_l0             | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            pit_b_224            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          pnasnet5large          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           regnety_002           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            repvgg_a2            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            fbnetv3_b            | 2  | pass  |    pass     |      pass      |     pass      | fail_accuracy |     fail_accuracy      |
|           resnest101e           | 2  | pass  |    pass     |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+-------------+----------------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|        twins_pcpvt_base         | 64  | 2.064  |  13.0072  |    21.5012     |   42.855    | 431.1592 |        426.4103        |
|         coat_lite_mini          | 128 | 1.0194 |  5.4653   |     7.961      |   14.7686   | 362.4216 |        372.6703        |
|           mobilevit_s           | 64  | 1.5683 |  7.1641   |      nan       |   42.4621   | 233.8428 |        237.9062        |
|        eca_halonext26ts         | 128 | 1.4144 |  5.4751   |      nan       |   55.2357   | 204.8437 |        207.0974        |
|        sebotnet33ts_256         | 64  | 1.7651 |  6.6709   |      nan       |   51.039    | 185.8238 |        191.2608        |
|       eca_botnext26ts_256       | 128 | 1.3797 |  5.2911   |      nan       |   52.9221   | 179.8768 |        176.7545        |
|  swin_base_patch4_window7_224   | 64  | 2.5123 |  12.7354  |      nan       |   58.0591   | 177.0112 |        174.7488        |
|      xcit_large_24_p8_224       |  5  | 2.603  |  17.1709  |      nan       |     nan     | 172.3324 |        164.8544        |
|          jx_nest_base           | 32  | 1.6708 |  9.2321   |      nan       |   57.8786   | 155.4547 |        156.5451        |
|          convnext_base          | 64  | 1.2341 |  5.9929   |      nan       |   20.8438   | 133.0295 |        129.8216        |
|          cait_m36_384           |  4  | 2.6486 |    nan    |      nan       |     nan     | 132.7509 |         130.12         |
|            hrnet_w18            | 128 | 5.6217 |  31.9848  |      nan       |  251.7181   | 106.8258 |        100.7524        |
|          botnet26t_256          | 128 | 1.3057 |  4.4635   |    10.0598     |   40.2751   | 106.2411 |        103.5341        |
|         crossvit_9_240          | 128 | 1.3396 |  7.9862   |      nan       |   27.0701   | 97.9064  |        96.8689         |
|           resnest101e           | 64  | 2.998  |  16.9945  |      nan       |   78.2291   | 93.9541  |        89.7619         |
|          pnasnet5large          | 16  | 4.1626 |  22.9703  |      nan       |  123.7628   | 87.4338  |        84.1545         |
|           volo_d1_224           | 64  | 1.1595 |  7.6273   |      nan       |   28.0879   | 85.2424  |        83.6849         |
|          gmlp_s16_224           | 128 | 0.9511 |  6.2939   |      nan       |   13.365    | 71.7498  |        69.4367         |
|         visformer_small         | 128 | 0.9009 |   4.189   |     6.2793     |   24.3038   | 71.1462  |        69.6831         |
|            pit_b_224            | 64  | 0.9339 |  4.8631   |      nan       |   12.5251   | 66.2774  |        65.1378         |
|        res2net101_26w_4s        | 64  | 2.9852 |  17.3432  |    28.4155     |   80.897    | 55.6027  |        52.0513         |
|          gmixer_24_224          | 128 | 1.0133 |  7.3092   |      nan       |   16.5474   | 51.9895  |        50.5586         |
|           convit_base           | 64  | 0.9843 |  5.9421   |      nan       |   18.0525   | 50.9922  |         49.952         |
|        res2net50_14w_8s         | 128 | 2.5693 |  15.6494  |      nan       |   98.8662   | 50.8157  |        49.7271         |
|        gluon_xception65         | 32  | 1.6885 |  11.1965  |      nan       |   41.7582   | 49.2318  |        45.5937         |
|         poolformer_m36          | 64  | 1.8121 |  9.7062   |      nan       |     nan     | 47.0371  |        44.6651         |
|          resmlp_12_224          | 128 | 0.6088 |   2.794   |     5.5064     |     nan     | 42.3381  |        38.0426         |
|     swsl_resnext101_32x16d      | 32  | 1.6289 |  10.0288  |      nan       |   39.6141   | 41.9677  |        41.3616         |
|             dpn107              | 32  | 3.7727 |  14.7274  |    45.6394     |   76.1359   | 40.3245  |        37.6555         |
|          mixer_b16_224          | 128 | 0.6548 |  3.2155   |      nan       |   10.7856   | 37.0102  |        35.4768         |
| deit_base_distilled_patch16_224 | 64  | 0.8289 |   4.303   |     6.6094     |   10.4203   | 36.0592  |        34.6956         |
|        convmixer_768_32         | 32  | 1.0862 |  6.4498   |      nan       |   13.7196   | 35.8067  |        33.0945         |
|            fbnetv3_b            | 128 | 3.0734 |  11.1026  |    29.9803     |   76.0043   | 35.7771  |        33.8855         |
|      vit_base_patch16_224       | 64  | 0.8583 |  4.1826   |     6.5315     |   9.6845    | 35.7583  |        35.0589         |
|       gluon_inception_v3        | 128 | 1.4815 |  8.9849   |      nan       |   66.9443   | 35.0345  |        32.4497         |
|          inception_v3           | 128 | 1.4787 |  9.0238   |      nan       |   67.1459   | 34.8548  |        32.5473         |
|        adv_inception_v3         | 128 | 1.4876 |  8.9769   |      nan       |   66.9311   | 34.3905  |        32.5332         |
|           tf_mixnet_l           | 128 | 5.7484 |  13.3541  |      nan       |   68.7911   | 33.8729  |        32.1963         |
|          ghostnet_100           | 128 | 2.6432 |  9.6507   |    13.7666     |   58.927    |  32.695  |        30.8681         |
|      beit_base_patch16_224      | 64  | 1.0871 |  5.6134   |      nan       |   13.7621   | 32.6318  |        30.8008         |
|            mixnet_l             | 128 | 5.3204 |  12.7271  |      nan       |   67.9763   | 32.5983  |         31.893         |
|           dm_nfnet_f0           | 128 | 2.0094 |  7.6042   |      nan       |   29.9754   | 32.3805  |        29.3454         |
|             dla102              | 128 | 1.6603 |  10.0975  |      nan       |   63.1714   | 32.1124  |        30.2312         |
|           res2next50            | 128 | 1.4989 |  8.7791   |      nan       |   66.7002   | 29.6202  |        27.9053         |
|           rexnet_100            | 128 | 1.8062 |  7.4568   |      nan       |  102.1027   | 26.5523  |        25.3591         |
|            tinynet_a            | 128 | 1.9614 |  8.2078   |    20.2872     |   61.7507   | 25.7941  |        24.6542         |
|          cspdarknet53           | 64  | 2.2264 |  7.7188   |    20.8213     |   48.0307   | 23.2515  |        22.0433         |
|            nfnet_l0             | 128 | 1.7245 |  7.5828   |      nan       |   27.3095   | 23.1165  |        21.8966         |
|       tf_efficientnet_b0        | 128 | 1.7202 |  6.9673   |      nan       |   61.9316   | 22.7574  |        21.5149         |
|           fbnetc_100            | 128 | 1.9567 |  6.9499   |     18.078     |   45.3002   | 21.9517  |        20.7368         |
|          spnasnet_100           | 128 | 1.9161 |   6.665   |    17.4815     |   43.4797   | 21.4795  |        20.4556         |
|      mobilenetv3_large_100      | 128 | 1.5899 |  5.5688   |    13.4352     |   64.4429   | 19.9372  |        19.5642         |
|           mnasnet_100           | 128 | 1.6356 |  5.5127   |    14.0767     |   37.4665   | 18.8558  |        18.0133         |
|         mobilenetv2_100         | 128 | 1.6442 |  5.4933   |    13.7945     |   37.5793   | 18.5669  |        17.7858         |
|            gernet_l             | 128 | 1.8816 |  6.4469   |    16.2236     |   35.9904   | 18.4345  |        17.2115         |
|            repvgg_a2            | 128 | 1.8567 |  6.1905   |    15.7371     |   43.751    | 17.9569  |        16.9557         |
|           regnety_002           | 128 | 1.4855 |  5.8417   |    13.8786     |   46.2472   | 17.8219  |        17.3541         |
|           selecsls42b           | 128 | 0.7717 |  4.0352   |     5.8995     |   39.8612   | 16.4046  |        15.3492         |
|            lcnet_050            | 128 | 0.9705 |  3.4278   |     7.1291     |   31.167    | 13.6937  |         12.51          |
|        ese_vovnet19b_dw         | 128 | 0.9768 |   3.251   |     6.9304     |   30.8107   | 12.7375  |        11.8284         |
|        tnt_s_patch16_224        | 128 | 1.4723 |  10.2065  |      nan       |   22.8828   |   nan    |        50.0197         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|          gmixer_24_224          | 128 | 0.9951 |  0.9716   |      nan       |   0.9859    |  1.5612  |         1.6333         |
|            tinynet_a            | 128 | 0.9942 |  0.7796   |     0.2617     |   0.7823    |  1.351   |         1.3692         |
|            nfnet_l0             | 128 | 0.993  |  0.8272   |      nan       |   0.8084    |  1.2908  |         1.3392         |
|           rexnet_100            | 128 | 0.9935 |  0.7843   |      nan       |   0.8682    |  1.2619  |         1.2765         |
|       tf_efficientnet_b0        | 128 | 0.9935 |  0.7688   |      nan       |   0.8401    |  1.1889  |         1.199          |
|          pnasnet5large          | 16  | 1.069  |   1.011   |      nan       |   1.2062    |  1.1876  |         1.3282         |
|           mobilevit_s           | 64  | 0.9959 |  0.7668   |      nan       |   0.7405    |  1.1793  |         1.2286         |
|       eca_botnext26ts_256       | 128 | 0.9938 |  0.7675   |      nan       |   0.7612    |  1.1378  |         1.2076         |
|        eca_halonext26ts         | 128 | 0.9937 |  0.7687   |      nan       |   0.7643    |  1.1375  |         1.2068         |
|          cait_m36_384           |  4  | 0.9994 |    nan    |      nan       |     nan     |  1.1185  |         1.1745         |
|         mobilenetv2_100         | 128 | 0.9925 |  0.7621   |     0.3063     |   0.7635    |  1.1003  |         1.1104         |
|         poolformer_m36          | 64  | 0.998  |  0.9512   |      nan       |     nan     |  1.0527  |         1.069          |
|           dm_nfnet_f0           | 128 | 0.9358 |  0.8936   |      nan       |   0.9479    |  1.0218  |         1.0495         |
|      beit_base_patch16_224      | 64  | 0.9966 |  0.9545   |      nan       |   0.8606    |  1.0038  |         1.0607         |
|           resnest101e           | 64  | 0.9971 |  0.9519   |      nan       |    0.95     |  0.9994  |         1.0025         |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9434   |     0.3153     |   0.8229    |  0.997   |         1.0835         |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9442   |     0.3138     |   0.8242    |  0.9925  |         1.0805         |
|        twins_pcpvt_base         | 64  | 0.9976 |  0.9195   |     0.3131     |   0.8403    |  0.9888  |         1.0866         |
|          ghostnet_100           | 128 | 0.9865 |  0.8768   |     0.3273     |   0.9345    |  0.9853  |         1.0102         |
|          mixer_b16_224          | 128 | 0.9952 |  0.9661   |      nan       |   0.8571    |  0.985   |         1.0538         |
|        convmixer_768_32         | 32  | 0.9986 |  0.9854   |      nan       |   0.9793    |  0.9836  |         0.9853         |
|           volo_d1_224           | 64  | 0.996  |  0.9213   |      nan       |   0.7472    |  0.9799  |         0.9971         |
|          gmlp_s16_224           | 128 | 0.9959 |  0.9783   |      nan       |   0.9704    |  0.9766  |         0.9827         |
|           tf_mixnet_l           | 128 | 0.9953 |   0.857   |      nan       |   0.8574    |  0.9711  |         1.0812         |
|            fbnetv3_b            | 128 | 0.9932 |  0.7828   |     0.3095     |    0.784    |  0.9696  |         0.977          |
|      xcit_large_24_p8_224       |  5  | 0.9981 |  0.9194   |      nan       |     nan     |  0.9611  |         1.0549         |
|          convnext_base          | 64  | 0.9975 |  0.9169   |      nan       |   0.7604    |  0.9576  |         0.9855         |
|             dla102              | 128 | 0.9831 |   0.917   |      nan       |   0.9529    |  0.9496  |         0.9538         |
|            hrnet_w18            | 128 | 0.9954 |  0.9252   |      nan       |   0.8649    |  0.9376  |         0.9419         |
|        gluon_xception65         | 32  | 0.9975 |  0.9365   |      nan       |   0.8982    |  0.9351  |         0.9376         |
|        res2net101_26w_4s        | 64  | 0.9968 |  0.9278   |     0.3243     |   0.8932    |  0.9269  |         0.9548         |
|          jx_nest_base           | 32  | 1.0002 |  0.8966   |      nan       |   0.7112    |  0.9187  |         1.0509         |
|        ese_vovnet19b_dw         | 128 | 0.9923 |  0.8877   |     0.3261     |   0.9302    |  0.9095  |         0.9161         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9288   |      nan       |    0.83     |  0.9068  |         1.0518         |
|             dpn107              | 32  | 0.9985 |  0.9271   |     0.3392     |   0.8941    |  0.9058  |         0.956          |
|           res2next50            | 128 | 0.9951 |  0.9153   |      nan       |   0.8618    |  0.9051  |         0.9312         |
|          spnasnet_100           | 128 | 0.989  |  0.9109   |     0.3309     |   0.8412    |  0.9047  |         0.9157         |
|            mixnet_l             | 128 | 0.9951 |   0.845   |      nan       |   0.7911    |  0.9014  |         1.0067         |
|      mobilenetv3_large_100      | 128 | 0.9876 |  0.8589   |     0.3244     |   0.8745    |  0.9007  |         0.9126         |
|         visformer_small         | 128 | 0.9943 |  0.9381   |     0.3293     |   0.9475    |  0.9006  |         0.951          |
|           selecsls42b           | 128 | 0.9883 |  0.8896   |     0.337      |   0.8954    |  0.899   |         0.9192         |
|        adv_inception_v3         | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|       gluon_inception_v3        | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|          inception_v3           | 128 | 0.9901 |  0.8617   |      nan       |   0.8724    |  0.8983  |         0.9073         |
|           mnasnet_100           | 128 | 0.9877 |  0.9019   |     0.3306     |   0.8279    |  0.8961  |         0.9077         |
|     swsl_resnext101_32x16d      | 32  | 0.9991 |  0.8972   |      nan       |   0.8675    |  0.8931  |         0.9249         |
|            lcnet_050            | 128 | 0.9672 |  0.7521   |     0.3171     |   0.7524    |  0.8921  |         0.923          |
|          cspdarknet53           | 64  | 0.9954 |  0.8528   |     0.316      |   0.8762    |  0.8835  |         0.8875         |
|        res2net50_14w_8s         | 128 | 0.9952 |  0.9049   |      nan       |   0.8611    |  0.881   |         0.9327         |
|           regnety_002           | 128 | 0.9717 |  0.8104   |     0.3283     |   0.7599    |  0.8617  |         0.8993         |
|          botnet26t_256          | 128 | 0.9915 |  0.8434   |     0.3165     |    0.745    |  0.8605  |         0.8702         |
|            pit_b_224            | 64  | 0.9968 |  0.7947   |      nan       |   0.6417    |  0.8417  |         1.0633         |
|           fbnetc_100            | 128 | 0.9891 |  0.8518   |     0.3236     |   0.7446    |  0.8416  |         0.8498         |
|        sebotnet33ts_256         | 64  | 0.9952 |  0.7084   |      nan       |   0.6831    |  0.841   |         0.9711         |
|         coat_lite_mini          | 128 | 1.0049 |  0.8777   |     0.3262     |   0.7873    |  0.8404  |         1.0528         |
|          resmlp_12_224          | 128 | 0.9893 |   0.943   |     0.2472     |     nan     |  0.8169  |         0.8253         |
|            gernet_l             | 128 | 0.9884 |  0.7892   |      0.32      |   0.7938    |  0.7928  |         0.8234         |
|            repvgg_a2            | 128 | 0.9867 |  0.8054   |     0.3277     |   0.6573    |  0.7684  |         0.8011         |
|           convit_base           | 64  | 0.9977 |  0.8838   |      nan       |   0.9506    |  0.7463  |         0.9008         |
|         crossvit_9_240          | 128 | 0.9884 |  0.8657   |      nan       |   0.7297    |  0.6496  |         0.8704         |
|        tnt_s_patch16_224        | 128 | 0.996  |  0.9769   |      nan       |   0.8539    |   nan    |         0.8623         |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_float32.png :

bench_logs/timm_models_float32.png :

bench_logs/huggingface_float32.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 98%, 52/53 | 100%, 42/42 | 100%, 61/61 |
|       aot_eager        | 98%, 52/53 | 100%, 42/42 | 97%, 59/61  |
|     aot_cudagraphs     | 75%, 40/53 | 55%, 23/42  | 80%, 49/61  |
|      aot_nvfuser       | 60%, 32/53 |  0%, 0/42   | 87%, 53/61  |
|        inductor        | 87%, 46/53 | 93%, 39/42  | 93%, 57/61  |
| inductor_no_cudagraphs | 89%, 47/53 | 93%, 39/42  | 93%, 57/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.01x    |    1.01x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|     aot_cudagraphs     |   1.19x    |    1.05x    |    1.00x    |
|      aot_nvfuser       |   1.16x    |    0.0x     |    1.18x    |
|        inductor        |   1.82x    |    1.79x    |    1.42x    |
| inductor_no_cudagraphs |   1.36x    |    1.54x    |    1.37x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    2.29    |    2.65     |    2.11     |
|       aot_eager        |    8.47    |    12.63    |    11.01    |
|     aot_cudagraphs     |   10.99    |    21.63    |    20.31    |
|      aot_nvfuser       |   26.97    |     0.0     |    68.40    |
|        inductor        |   57.44    |    62.79    |    89.06    |
| inductor_no_cudagraphs |   60.44    |    57.49    |    87.16    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.96x    |    0.99x    |    0.99x    |
|       aot_eager        |   0.85x    |    0.89x    |    0.87x    |
|     aot_cudagraphs     |   0.42x    |    0.38x    |    0.32x    |
|      aot_nvfuser       |   0.83x    |    0.0x     |    0.84x    |
|        inductor        |   0.83x    |    0.91x    |    0.95x    |
| inductor_no_cudagraphs |   0.92x    |    1.08x    |    1.01x    |
+------------------------+------------+-------------+-------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|            densenet121            |  4   | 1.0015 |   0.903   |     2.5542     |   1.3974    |  5.8305  |         1.3362         |
|       functorch_dp_cifar10        |  64  | 1.0051 |  0.9122   |     2.4037     |   1.1922    |  4.9773  |         1.3812         |
|         timm_efficientdet         |  1   | 0.9859 |  0.8024   |      0.0       |     0.0     |  4.6441  |         1.5539         |
|          resnext50_32x4d          |  8   | 1.0013 |  0.9432   |     1.8333     |   1.3273    |  3.8723  |         1.2702         |
|      timm_vision_transformer      |  8   | 1.008  |  0.8501   |     1.742      |   1.3624    |  3.2471  |         1.5463         |
|           BERT_pytorch            |  16  | 1.0097 |  0.8329   |      0.0       |     0.0     |  3.1928  |         2.3713         |
|        mobilenet_v3_large         |  32  | 1.0032 |  1.0002   |     1.4877     |   1.3516    |  3.0229  |         1.4325         |
|                drq                |  1   | 1.0115 |   0.791   |     1.7557     |   1.0892    |  3.0111  |         1.1618         |
|             resnet18              |  16  | 0.9998 |  0.9836   |     1.6375     |    1.329    |  2.7965  |         1.2751         |
|            mnasnet1_0             |  32  | 0.9991 |  1.0096   |     1.2564     |   1.3313    |  2.5975  |         1.367          |
|               dcgan               |  32  | 0.9794 |  0.9124   |     1.6954     |   0.7727    |  2.5528  |         1.0745         |
|            hf_T5_large            |  2   | 1.0234 |  0.8593   |      0.0       |     0.0     |  2.4405  |         2.1154         |
|           squeezenet1_1           |  32  | 0.9965 |  0.9462   |     1.4589     |   1.1787    |  2.4194  |         1.3043         |
|             hf_Albert             |  8   | 1.0007 |  0.9558   |     0.7735     |     0.0     |  2.3758  |         2.3241         |
|         timm_efficientnet         |  32  | 0.9581 |  0.8098   |     1.1659     |   1.1758    |  2.2672  |         1.3034         |
|          pytorch_struct           | 200  | 0.9935 |  0.7348   |     1.0231     |   0.9919    |  2.1166  |         1.2655         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9991 |  0.9109   |     1.7221     |   1.2064    |  2.1135  |         1.3964         |
|           lennard_jones           | 1000 | 0.9772 |  0.7441   |     1.2718     |   1.0356    |  2.0608  |         1.0576         |
|              hf_Bart              |  4   | 1.0103 |  0.8449   |      0.0       |     0.0     |  2.0391  |         1.6735         |
|              hf_Bert              |  4   | 1.0349 |  0.8602   |     0.9399     |     0.0     |  1.9983  |         1.8402         |
|             resnet50              |  32  | 1.0022 |  1.0033   |     1.0301     |   1.3595    |  1.9266  |         1.3564         |
|           timm_resnest            |  32  | 1.0048 |  1.0197   |     0.8351     |    1.312    |  1.9241  |         1.6769         |
|              hf_GPT2              |  4   | 1.0171 |  0.9849   |      0.0       |     0.0     |  1.8692  |         1.8085         |
|          LearningToPaint          |  96  | 1.0017 |   0.995   |     1.1601     |   1.3515    |  1.8513  |         1.311          |
|               hf_T5               |  8   | 1.0011 |  0.9463   |      0.0       |     0.0     |  1.8365  |         1.8371         |
|         soft_actor_critic         | 256  | 1.0145 |   0.732   |     1.3397     |   1.0578    |  1.7429  |         1.037          |
|        speech_transformer         |  32  | 1.003  |  0.8409   |      0.0       |     0.0     |  1.7106  |         1.6704         |
|        shufflenet_v2_x1_0         | 128  | 1.003  |  1.0117   |     0.9602     |   1.3374    |  1.7071  |         1.4237         |
|           mobilenet_v2            |  96  | 0.9998 |   1.013   |     0.7636     |   0.9257    |  1.5601  |         1.4988         |
| attention_is_all_you_need_pytorch | 256  | 1.0093 |  0.9266   |      0.0       |     0.0     |  1.5239  |         1.4695         |
|            timm_nfnet             | 128  | 0.9988 |  0.9993   |      0.0       |   1.1742    |  1.4979  |         1.4301         |
|           fastNLP_Bert            |  6   | 0.9993 |  0.8745   |     0.7662     |     0.0     |  1.4721  |         1.4183         |
|           hf_DistilBert           |  8   | 1.0013 |  0.9725   |     0.7339     |     0.0     |  1.4625  |         1.4374         |
|           pytorch_unet            |  1   | 0.9996 |  0.9928   |     0.8627     |   1.1557    |  1.3444  |         1.3151         |
|          pytorch_stargan          |  16  | 0.9962 |   1.042   |     0.9686     |   1.0909    |  1.3115  |         1.2631         |
|            timm_vovnet            |  32  | 0.9181 |  0.8811   |     0.8605     |   1.1327    |  1.2991  |         1.1483         |
|            timm_regnet            |  32  | 0.9787 |  0.9334   |     0.8852     |   1.1754    |  1.2976  |         1.2383         |
|            Super_SloMo            |  6   | 0.9996 |  0.9958   |     0.8864     |     0.0     |  1.2911  |         1.257          |
|               vgg16               |  64  | 0.9996 |  0.9976   |     0.8575     |   0.9945    |  1.2719  |         1.2625         |
|        Background_Matting         |  4   | 0.9999 |  1.0184   |     0.8934     |   1.1155    |  1.2258  |         1.2096         |
|              alexnet              | 128  | 0.9995 |  0.9973   |     0.8153     |    1.003    |  1.2121  |         1.2079         |
|   timm_vision_transformer_large   |  8   |  1.0   |  0.9904   |      0.0       |   0.9936    |  1.1621  |         1.1383         |
|            hf_Reformer            |  4   | 0.9959 |  1.0001   |     0.9451     |     0.0     |  1.158   |         1.153          |
|            hf_BigBird             |  2   | 0.9955 |  0.9189   |     1.0476     |     0.0     |  1.154   |         1.0283         |
|              yolov3               |  16  | 0.9998 |   0.991   |     0.8032     |   0.8698    |  1.1035  |         1.0801         |
|            tts_angular            |  64  | 0.9869 |  0.9355   |     0.9834     |   0.9909    |  1.0043  |         1.0149         |
|              demucs               |  4   | 1.0004 |  1.0003   |     1.0009     |   0.9991    |  1.0019  |         0.9989         |
|      nvidia_deeprecommender       | 256  | 0.9993 |  0.9964   |     0.6973     |   0.9788    |  0.9901  |         1.0298         |
|               dlrm                | 2048 | 1.1324 |  1.1034   |      0.0       |     0.0     |  0.9351  |          0.0           |
|           hf_GPT2_large           |  4   | 1.0004 |   0.991   |      0.0       |     0.0     |   0.0    |         1.7437         |
|             tacotron2             |  64  | 0.9831 |  0.7609   |     1.0061     |     0.0     |   0.0    |         0.883          |
|           hf_Longformer           |  2   | 0.9641 |  0.8775   |     0.8954     |     0.0     |   0.0    |          0.0           |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |     0.0     |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  |   aot_nvfuser    |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|           hf_DistilBert           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|      timm_vision_transformer      |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |          pass          |
|            Super_SloMo            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           fastNLP_Bert            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|             hf_Albert             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bert              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_BigBird             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|         timm_efficientnet         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           BERT_pytorch            |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|        speech_transformer         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            timm_regnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        Background_Matting         |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             tacotron2             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |   fail_to_run    |          pass          |
|           hf_Longformer           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |     fail_accuracy      |
|          vision_maskrcnn          |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |   fail_to_run    |         0.0000         |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+
|         timm_efficientdet         |  1   | 19.9114 |  44.0837  |      nan       |     nan     | 492.8052 |        511.3603        |
|              yolov3               |  16  | 2.9509  |  10.458   |    14.4773     |   41.3766   | 439.0747 |        436.3863        |
|            hf_T5_large            |  2   | 13.8119 |  47.1709  |      nan       |     nan     | 230.7633 |        222.5011        |
|        speech_transformer         |  32  | 1.9497  |  11.2497  |      nan       |     nan     | 162.3094 |        159.0526        |
|      timm_vision_transformer      |  8   | 0.9749  |  5.8412   |     7.9448     |   14.0461   | 150.3482 |        144.8052        |
| attention_is_all_you_need_pytorch | 256  | 1.2653  |  9.3012   |      nan       |     nan     | 146.818  |        150.856         |
|   timm_vision_transformer_large   |  8   |  2.757  |  19.5575  |      nan       |   37.911    | 143.2689 |        143.5467        |
|           timm_resnest            |  32  | 0.6011  |  3.4208   |     4.6698     |   42.0817   | 137.1813 |        129.8015        |
|          pytorch_stargan          |  16  |  0.411  |  2.7576   |     3.6579     |   6.8457    | 105.3448 |        111.7027        |
|           BERT_pytorch            |  16  | 1.6793  |  9.8107   |      nan       |     nan     | 104.3494 |        102.8372        |
|          pytorch_struct           | 200  | 0.2721  |   1.097   |     1.745      |    5.276    | 80.2653  |        81.8498         |
|           fastNLP_Bert            |  6   | 1.7408  |  9.3776   |    13.5213     |     nan     | 72.3646  |        69.3339         |
|              hf_GPT2              |  4   | 1.4611  |  7.9839   |      nan       |     nan     | 67.4414  |        66.1509         |
|              hf_Bart              |  4   | 1.7362  |  11.152   |      nan       |     nan     |  55.024  |         55.477         |
|            densenet121            |  4   | 2.2111  |  16.3732  |    24.8739     |  127.2016   | 53.6837  |        50.6568         |
|               hf_T5               |  8   | 2.3589  |  11.0701  |      nan       |     nan     | 53.2822  |        50.6323         |
|            hf_BigBird             |  2   | 8.2266  |  16.8322  |    37.1242     |     nan     | 49.2829  |        31.4395         |
|        mobilenet_v3_large         |  32  | 0.9736  |   6.349   |     8.9568     |   72.4257   | 47.9591  |        47.5266         |
|             hf_Albert             |  8   | 1.3677  |  8.3808   |    12.4608     |     nan     | 47.7013  |        47.7293         |
|              hf_Bert              |  4   | 1.5692  |  8.6915   |    12.1088     |     nan     | 45.7389  |        45.1049         |
|            timm_regnet            |  32  | 2.3016  |  10.6663  |    24.2721     |   59.3949   | 39.5138  |        39.0703         |
|         timm_efficientnet         |  32  | 1.8065  |   8.466   |    18.8325     |   69.1885   | 36.8646  |        36.2609         |
|            hf_Reformer            |  4   | 2.4895  |   5.097   |     9.8331     |     nan     | 36.4816  |        30.8703         |
|           hf_DistilBert           |  8   | 0.6209  |  4.1757   |     8.2817     |     nan     | 34.7598  |        34.8714         |
|            timm_nfnet             | 128  | 2.0752  |   8.883   |      nan       |   37.9942   | 33.2343  |        31.3883         |
|            mnasnet1_0             |  32  | 0.8436  |  5.8292   |     8.0116     |   43.3539   | 31.7482  |        31.3722         |
|          resnext50_32x4d          |  8   | 0.9736  |  6.1999   |     8.4015     |   36.2031   | 31.1282  |        32.0186         |
|            timm_vovnet            |  32  | 1.4893  |  5.5062   |     11.824     |   30.9869   | 31.0935  |        30.1586         |
|             resnet50              |  32  | 0.8799  |  6.2202   |     8.514      |   40.7493   |  29.964  |        30.6567         |
|       functorch_dp_cifar10        |  64  | 0.3826  |  2.4582   |     3.3723     |    6.311    | 27.1014  |        26.7867         |
|             resnet18              |  16  | 0.4581  |  2.3811   |     3.4255     |   23.0229   | 21.9997  |        21.4065         |
|        shufflenet_v2_x1_0         | 128  | 0.9845  |  6.6871   |     9.2419     |   37.2394   | 21.5079  |        20.6157         |
|        Background_Matting         |  4   | 0.9358  |  6.1012   |     8.6269     |   41.4341   | 20.2369  |        19.4603         |
|            Super_SloMo            |  6   | 1.0597  |  6.1892   |     8.1656     |     nan     | 20.0702  |        19.6506         |
|           mobilenet_v2            |  96  | 0.8432  |  5.7818   |     8.4424     |   40.7764   | 19.8779  |        19.7928         |
|           pytorch_unet            |  1   | 0.4658  |  2.7325   |     3.7737     |   26.2469   |  9.6139  |         9.1328         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.4277  |  2.8222   |     3.7265     |    4.774    |  9.5293  |         9.2305         |
|          LearningToPaint          |  96  | 0.4592  |  2.4914   |     3.5797     |   29.9899   |  8.3718  |         8.0385         |
|           squeezenet1_1           |  32  | 0.2661  |  1.3923   |     1.9142     |   6.4786    |  5.1781  |         4.8207         |
|      nvidia_deeprecommender       | 256  | 0.1953  |  0.6331   |     0.9235     |   2.9178    |  4.8321  |         4.3927         |
|               vgg16               |  64  | 0.1828  |  0.9539   |     1.4132     |   3.6292    |  4.363   |         4.0061         |
|                drq                |  1   | 0.1592  |  0.6488   |     0.9825     |   4.3666    |  4.1608  |         3.6575         |
|               dlrm                | 2048 | 0.4597  |  1.0084   |      nan       |     nan     |  3.847   |          nan           |
|         soft_actor_critic         | 256  | 0.2128  |  0.4323   |     0.6095     |   2.0178    |  3.6294  |         2.9491         |
|              alexnet              | 128  |  0.164  |  0.5874   |     0.8993     |   3.1328    |  3.4081  |         3.2135         |
|               dcgan               |  32  |  0.174  |  0.5412   |     0.8098     |   4.2474    |  2.8988  |         2.6858         |
|           lennard_jones           | 1000 | 0.1533  |  0.4318   |     0.6178     |   1.4628    |  2.2326  |         1.9707         |
|            tts_angular            |  64  | 0.2265  |  0.2958   |     0.4262     |   1.0385    |  1.8991  |         1.6791         |
|              demucs               |  4   | 0.3415  |  0.3463   |     0.3455     |   0.3616    |  0.2601  |         0.2504         |
|           hf_GPT2_large           |  4   | 5.4141  |  25.1179  |      nan       |     nan     |   nan    |        153.4359        |
|             tacotron2             |  64  | 17.0671 |  32.0887  |    54.1322     |     nan     |   nan    |        108.7741        |
|           hf_Longformer           |  2   | 6.1566  |  16.1726  |    84.5126     |     nan     |   nan    |          nan           |
|               moco                |  0   |   nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+
|         timm_efficientnet         |  32  | 0.988  |  0.7698   |     0.2719     |   0.7887    |  1.2042  |         1.2318         |
|             hf_Albert             |  8   | 0.9814 |   0.936   |     0.3268     |     nan     |  1.1576  |         1.4693         |
|            Super_SloMo            |  6   | 1.0024 |  0.9645   |     0.3842     |     nan     |  1.0536  |         1.1475         |
|            timm_nfnet             | 128  | 0.9693 |  0.8982   |      nan       |   0.9445    |  1.0337  |         1.1245         |
|         timm_efficientdet         |  1   | 1.028  |  0.8404   |      nan       |     nan     |  1.0226  |         1.0403         |
|           mobilenet_v2            |  96  | 0.9857 |  0.7639   |     0.3119     |   0.9117    |  1.0074  |         1.0232         |
|            tts_angular            |  64  | 1.0002 |  1.0002   |     0.9853     |   1.0002    |  0.9895  |         1.0002         |
|              demucs               |  4   | 0.9872 |  0.9872   |     0.9872     |   0.9872    |  0.9872  |         0.9872         |
| attention_is_all_you_need_pytorch | 256  | 0.9979 |   0.94    |      nan       |     nan     |  0.9829  |         1.1269         |
|           BERT_pytorch            |  16  |  1.0   |  0.8825   |      nan       |     nan     |  0.9728  |         1.1006         |
|              hf_GPT2              |  4   | 0.9706 |  0.8847   |      nan       |     nan     |  0.9648  |         1.1252         |
|        Background_Matting         |  4   | 1.0138 |  0.9624   |     0.3723     |   0.9813    |  0.9316  |         0.9364         |
|               hf_T5               |  8   | 0.9678 |  0.9331   |      nan       |     nan     |  0.9309  |         1.2521         |
|            timm_regnet            |  32  | 0.9953 |  0.8446   |     0.3494     |    0.85     |  0.9249  |         0.9292         |
|        speech_transformer         |  32  | 1.0017 |  0.9174   |      nan       |     nan     |  0.9066  |         0.9109         |
|              yolov3               |  16  | 0.9908 |  0.8381   |     0.3537     |   0.8244    |  0.8991  |         0.9038         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.8609   |     0.4238     |   0.8441    |  0.8861  |         0.982          |
|   timm_vision_transformer_large   |  8   | 0.9974 |  0.8357   |      nan       |   0.8494    |  0.879   |         0.9542         |
|           timm_resnest            |  32  | 0.9868 |  0.8711   |     0.3481     |   0.8623    |  0.8759  |         0.9953         |
|            densenet121            |  4   | 0.9857 |  0.8678   |     0.3667     |   0.8376    |  0.8753  |         0.9535         |
|              hf_Bert              |  4   | 0.9844 |  0.8753   |     0.3903     |     nan     |  0.8736  |         0.9414         |
|           pytorch_unet            |  1   | 0.9968 |  0.8653   |     0.3571     |   0.8496    |  0.8678  |         0.8715         |
|           fastNLP_Bert            |  6   | 1.0012 |  0.8966   |     0.3702     |     nan     |  0.8661  |         1.0348         |
|             resnet50              |  32  | 0.9907 |  0.8629   |     0.3562     |   0.7995    |  0.8659  |         0.885          |
|           squeezenet1_1           |  32  | 0.9604 |  0.7958   |     0.3458     |   0.7589    |  0.8611  |         0.8951         |
|        shufflenet_v2_x1_0         | 128  | 0.956  |  0.8401   |     0.3573     |   0.8503    |  0.856   |         0.8927         |
|            hf_T5_large            |  2   | 0.8541 |  0.8541   |      nan       |     nan     |  0.8541  |         0.8541         |
|           hf_DistilBert           |  8   | 0.9505 |  0.8806   |     0.3414     |     nan     |  0.8387  |         0.9058         |
|               dcgan               |  32  | 0.9698 |  0.7838   |     0.5014     |   0.7073    |  0.8283  |         0.8738         |
|              hf_Bart              |  4   | 0.9102 |  0.8125   |      nan       |     nan     |  0.8137  |         0.9762         |
|            hf_BigBird             |  2   | 0.9837 |  0.9784   |     0.4544     |     nan     |  0.8098  |         1.096          |
|              alexnet              | 128  | 0.951  |  0.7753   |     0.4793     |   0.7753    |  0.7974  |         0.9099         |
|        mobilenet_v3_large         |  32  | 0.9776 |  0.8499   |     0.3446     |    0.866    |  0.7918  |         0.8145         |
|          pytorch_stargan          |  16  | 0.9929 |  0.9742   |     0.4253     |   0.8882    |  0.7783  |         0.8847         |
|          resnext50_32x4d          |  8   | 0.9932 |  0.8549   |     0.3882     |   0.8176    |  0.7644  |         0.7753         |
|            mnasnet1_0             |  32  | 0.9785 |  0.8621   |     0.3408     |   0.8207    |  0.7541  |         0.7741         |
|                drq                |  1   | 0.9877 |  0.8312   |     0.4769     |   0.8308    |  0.752   |         0.9256         |
|            timm_vovnet            |  32  | 0.9903 |  0.7678   |     0.3405     |   0.7742    |  0.7513  |         0.761          |
|               vgg16               |  64  | 0.9924 |  0.7339   |     0.3775     |   0.7172    |  0.7491  |         0.7534         |
|         soft_actor_critic         | 256  | 0.9998 |  0.9149   |     0.4736     |   0.9149    |  0.7295  |         1.0367         |
|          LearningToPaint          |  96  | 0.9252 |  0.7196   |     0.3826     |   0.6722    |  0.7295  |         0.8017         |
|      timm_vision_transformer      |  8   | 0.9952 |  0.8826   |     0.3916     |   0.8871    |  0.7151  |         0.7249         |
|               dlrm                | 2048 | 0.7301 |  0.7306   |      nan       |     nan     |  0.704   |          nan           |
|             resnet18              |  16  | 0.9779 |  0.7727   |     0.3947     |   0.7276    |  0.6102  |         0.6257         |
|           lennard_jones           | 1000 | 0.9995 |  0.9997   |     0.3734     |   1.0967    |  0.564   |         0.9991         |
|      nvidia_deeprecommender       | 256  | 0.5596 |  0.5596   |     0.5125     |   0.5596    |  0.5596  |         0.5596         |
|       functorch_dp_cifar10        |  64  | 0.9964 |  0.8107   |     0.4465     |   0.8452    |  0.4478  |         0.4806         |
|          pytorch_struct           | 200  |  1.0   |  0.5081   |     0.4858     |   0.5082    |  0.4235  |         0.4307         |
|            hf_Reformer            |  4   | 0.3764 |  0.9847   |     0.3481     |     nan     |  0.3629  |         0.9878         |
|           hf_GPT2_large           |  4   | 0.9582 |  0.8718   |      nan       |     nan     |   nan    |         1.1351         |
|             tacotron2             |  64  | 0.9866 |  0.4047   |     0.3143     |     nan     |   nan    |         0.4114         |
|           hf_Longformer           |  2   | 0.9734 |   0.967   |     0.3492     |     nan     |   nan    |          nan           |
|               moco                |  0   |  nan   |    nan    |      nan       |     nan     |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------------+-------------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|            YituTechConvBert             |  1  | 1.0235 |  0.8558   |      0.0       |     0.0     |  5.483   |         1.6487         |
|          MobileBertForMaskedLM          | 32  | 1.0252 |  0.8276   |      0.0       |     0.0     |  5.2736  |         1.7228         |
|                CamemBert                |  1  | 1.0387 |  0.8591   |     1.7215     |     0.0     |  3.653   |         1.8276         |
|       MT5ForConditionalGeneration       |  8  | 1.0188 |  0.8671   |      0.0       |     0.0     |  3.6158  |         2.5334         |
|     MobileBertForQuestionAnswering      | 64  | 1.0186 |  0.8299   |      0.0       |     0.0     |  3.6056  |         1.7828         |
|               DistillGPT2               |  1  | 1.0273 |  0.8953   |     1.2248     |     0.0     |  3.1234  |         2.0305         |
|     M2M100ForConditionalGeneration      |  8  | 1.0422 |   0.866   |     1.2047     |     0.0     |  2.6847  |         1.8301         |
|     PLBartForConditionalGeneration      | 16  | 1.0137 |  0.8405   |      0.0       |     0.0     |  2.3305  |         1.7503         |
|    MegatronBertForQuestionAnswering     | 16  | 1.0292 |  0.8507   |     1.0548     |     0.0     |  2.2397  |         1.804          |
|      GPT2ForSequenceClassification      |  4  | 1.0015 |  0.9769   |      0.0       |     0.0     |  2.1507  |         2.1149         |
|             XGLMForCausalLM             |  8  | 1.0138 |  0.8239   |      0.0       |     0.0     |  2.0552  |         1.7342         |
|       ElectraForQuestionAnswering       | 64  | 1.0003 |  0.9802   |     0.7604     |     0.0     |  1.9541  |         1.9093         |
|      MBartForConditionalGeneration      | 16  | 1.0141 |  0.8508   |      0.0       |     0.0     |  1.8315  |         1.6062         |
|           ElectraForCausalLM            | 32  | 1.0002 |  0.9429   |     0.7162     |     0.0     |  1.8026  |         1.8009         |
|         MegatronBertForCausalLM         | 16  | 1.0335 |  0.8507   |     0.9584     |     0.0     |  1.7953  |         1.7105         |
|    LayoutLMForSequenceClassification    | 16  | 1.0007 |  0.9808   |     0.7769     |     0.0     |  1.7445  |         1.7013         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.8859   |      0.0       |     0.0     |  1.6775  |         1.666          |
|                 T5Small                 |  1  | 1.0245 |  0.8955   |      0.0       |     0.0     |  1.6644  |          1.47          |
|            AlbertForMaskedLM            |  4  | 1.0001 |  0.8853   |      0.0       |     0.0     |  1.6623  |         1.6537         |
|     PegasusForConditionalGeneration     | 16  | 1.0134 |  0.8306   |     0.8996     |     0.0     |  1.6588  |         1.6502         |
|         Speech2Text2ForCausalLM         | 128 | 1.0043 |  0.9374   |     0.7216     |     0.0     |  1.6199  |         1.5933         |
|       T5ForConditionalGeneration        |  4  | 1.0036 |  0.9327   |      0.0       |     0.0     |  1.6029  |         1.598          |
|           LayoutLMForMaskedLM           | 16  | 1.0005 |  0.9722   |     0.757      |     0.0     |  1.5899  |         1.5607         |
|             OPTForCausalLM              | 32  | 1.0119 |  0.9294   |      0.0       |     0.0     |  1.5854  |         1.5609         |
|     DistilBertForQuestionAnswering      | 64  | 1.0011 |  0.9688   |     0.7412     |     0.0     |  1.4539  |         1.4069         |
|      BartForConditionalGeneration       |  2  | 1.0055 |  0.9714   |      0.0       |     0.0     |  1.4529  |         1.4264         |
|             BartForCausalLM             |  4  | 1.0009 |  0.9695   |      0.0       |     0.0     |  1.4515  |         1.451          |
|        BertForQuestionAnswering         | 128 | 1.0001 |  0.9849   |     0.7783     |     0.0     |  1.4282  |         1.4067         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0059 |  0.9211   |      0.0       |     0.0     |  1.4264  |         1.4402         |
|       RobertaForQuestionAnswering       | 128 |  1.0   |  0.9838   |     0.7757     |     0.0     |  1.4241  |         1.4018         |
|           RobertaForCausalLM            | 64  | 1.0005 |  0.9599   |     0.7532     |     0.0     |  1.4198  |         1.391          |
|             BertForMaskedLM             | 64  | 0.9996 |  0.9589   |     0.7407     |     0.0     |  1.3266  |         1.3117         |
|            PLBartForCausalLM            | 32  | 1.0059 |  0.9421   |     0.7969     |     0.0     |  1.3165  |         1.3156         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0006 |  0.9275   |      0.0       |     0.0     |  1.2917  |         1.3053         |
|          DistilBertForMaskedLM          | 64  | 1.0005 |   0.952   |     0.7092     |     0.0     |  1.2725  |         1.271          |
|           DebertaForMaskedLM            |  4  | 0.9355 |   0.736   |     0.8188     |     0.0     |  1.2636  |         1.1694         |
|           PegasusForCausalLM            | 32  | 1.0016 |  0.9503   |     0.7561     |     0.0     |  1.2438  |         1.1987         |
|            MBartForCausalLM             | 32  | 0.9998 |  0.9506   |      0.0       |     0.0     |  1.2097  |         1.2113         |
|            TrOCRForCausalLM             | 32  | 1.0008 |  0.9411   |      0.0       |     0.0     |  1.2065  |         1.2055         |
|                 BigBird                 |  1  | 0.9782 |  0.9113   |     1.0508     |     0.0     |  1.1571  |         1.034          |
|       DebertaForQuestionAnswering       |  8  | 0.9954 |  0.9043   |     0.7223     |     0.0     |  1.1551  |         1.2349         |
|          AllenaiLongformerBase          |  1  | 0.9551 |  0.7384   |     0.8532     |     0.0     |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+-----------------------------------------+----+-------+-----------+----------------+-------------+-------------+------------------------+
|                  name                   | bs | eager | aot_eager | aot_cudagraphs | aot_nvfuser |  inductor   | inductor_no_cudagraphs |
+-----------------------------------------+----+-------+-----------+----------------+-------------+-------------+------------------------+
|            AlbertForMaskedLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       AlbertForQuestionAnswering        | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       RobertaForQuestionAnswering       | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|         Speech2Text2ForCausalLM         | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|             BartForCausalLM             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      BartForConditionalGeneration       | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|      GPT2ForSequenceClassification      | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            MBartForCausalLM             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       MT5ForConditionalGeneration       | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|          MobileBertForMaskedLM          | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|     MobileBertForQuestionAnswering      | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             OPTForCausalLM              | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|       T5ForConditionalGeneration        | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|                 T5Small                 | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            TrOCRForCausalLM             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|             XGLMForCausalLM             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            XLNetLMHeadModel             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|            YituTechConvBert             | 1  | pass  |   pass    |  fail_to_run   | fail_to_run |    pass     |          pass          |
|           RobertaForCausalLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|     PegasusForConditionalGeneration     | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|           PegasusForCausalLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|     DistilBertForQuestionAnswering      | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|             BertForMaskedLM             | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|        BertForQuestionAnswering         | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|                 BigBird                 | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       BlenderbotSmallForCausalLM        | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|                CamemBert                | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|           DebertaForMaskedLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       DebertaForQuestionAnswering       | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|            PLBartForCausalLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|          DistilBertForMaskedLM          | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|               DistillGPT2               | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|           ElectraForCausalLM            | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|       ElectraForQuestionAnswering       | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|           LayoutLMForMaskedLM           | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|    LayoutLMForSequenceClassification    | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|     M2M100ForConditionalGeneration      | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|         MegatronBertForCausalLM         | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|    MegatronBertForQuestionAnswering     | 1  | pass  |   pass    |      pass      | fail_to_run |    pass     |          pass          |
|          AllenaiLongformerBase          | 1  | pass  |   pass    |      pass      | fail_to_run | fail_to_run |      fail_to_run       |
|      MBartForConditionalGeneration      | 1  | pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
|     PLBartForConditionalGeneration      | 1  | pass  |   pass    |  fail_to_run   | fail_to_run | fail_to_run |      fail_to_run       |
+-----------------------------------------+----+-------+-----------+----------------+-------------+-------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|           DebertaForMaskedLM            |  4  | 5.3017 |  12.4916  |    47.5691     |     nan     | 211.3213 |        131.3429        |
|       DebertaForQuestionAnswering       |  8  | 5.0697 |  12.4939  |    46.8481     |     nan     | 204.9258 |        135.5152        |
|             XGLMForCausalLM             |  8  | 2.7407 |  17.1885  |      nan       |     nan     | 193.4528 |        191.6258        |
|     M2M100ForConditionalGeneration      |  8  | 3.4286 |  20.9146  |    34.7878     |     nan     | 146.7786 |        133.1317        |
|            YituTechConvBert             |  1  | 2.3925 |  13.6824  |      nan       |     nan     | 136.8308 |        137.1263        |
|          MobileBertForMaskedLM          | 32  | 9.0942 |  39.9107  |      nan       |     nan     | 109.5767 |        108.128         |
|     MobileBertForQuestionAnswering      | 64  | 9.2437 |  40.3418  |      nan       |     nan     |  99.942  |        98.5995         |
|       MT5ForConditionalGeneration       |  8  | 3.5479 |  16.2442  |      nan       |     nan     | 99.4221  |        95.6132         |
|         MegatronBertForCausalLM         | 16  | 3.5299 |  18.0083  |    26.2207     |     nan     | 72.8288  |         72.383         |
|     PegasusForConditionalGeneration     | 16  | 3.3212 |  21.1437  |    33.1442     |     nan     | 70.9152  |        66.9883         |
|    MegatronBertForQuestionAnswering     | 16  | 3.7279 |  18.494   |    26.5553     |     nan     | 70.3712  |        70.4324         |
|      MBartForConditionalGeneration      | 16  | 3.5303 |  21.7799  |      nan       |     nan     | 68.0877  |        66.4723         |
|      BartForConditionalGeneration       |  2  | 3.3485 |  21.9928  |      nan       |     nan     | 66.7221  |        66.7763         |
|       T5ForConditionalGeneration        |  4  | 2.3184 |  10.9554  |      nan       |     nan     | 63.1977  |        62.3629         |
|                 T5Small                 |  1  | 2.2923 |  10.8059  |      nan       |     nan     | 60.7041  |        58.9361         |
|    LayoutLMForSequenceClassification    | 16  | 1.9684 |  9.4242   |    13.1009     |     nan     | 59.8057  |        62.4836         |
|     PLBartForConditionalGeneration      | 16  | 1.7535 |  11.1872  |      nan       |     nan     | 53.0942  |        50.2183         |
| BlenderbotSmallForConditionalGeneration | 64  | 2.1494 |  15.3537  |      nan       |     nan     | 50.8133  |        49.8389         |
|                 BigBird                 |  1  | 8.0811 |  16.8812  |    36.2673     |     nan     | 48.8828  |         32.245         |
|           ElectraForCausalLM            | 32  | 1.6602 |  8.9457   |    12.6281     |     nan     |  47.458  |        46.0467         |
|             BertForMaskedLM             | 64  | 1.5157 |  8.8146   |    12.2412     |     nan     | 40.1186  |        39.4486         |
|           LayoutLMForMaskedLM           | 16  | 2.0147 |  9.3845   |    13.6177     |     nan     | 39.7156  |        38.7913         |
|       ElectraForQuestionAnswering       | 64  | 1.6444 |  9.0466   |    12.8008     |     nan     | 37.2773  |        36.2002         |
|           RobertaForCausalLM            | 64  | 1.5418 |   8.898   |    12.8775     |     nan     | 35.6049  |        34.4923         |
|      GPT2ForSequenceClassification      |  4  | 1.5018 |  7.9184   |      nan       |     nan     | 34.5853  |        32.6152         |
|           PegasusForCausalLM            | 32  | 1.277  |  8.0313   |    12.1633     |     nan     |  32.796  |        31.1784         |
|        BertForQuestionAnswering         | 128 | 1.5691 |  8.8396   |    12.3099     |     nan     | 31.9624  |        31.5318         |
|            MBartForCausalLM             | 32  | 1.2391 |  8.2621   |      nan       |     nan     | 29.9812  |        30.1429         |
|             BartForCausalLM             |  4  | 1.2927 |  8.1178   |      nan       |     nan     | 29.6104  |         28.427         |
|            AlbertForMaskedLM            |  4  | 1.4322 |  8.5712   |      nan       |     nan     | 29.5818  |        28.7253         |
|            TrOCRForCausalLM             | 32  | 1.2153 |  8.1026   |      nan       |     nan     | 29.3851  |        29.4579         |
|          DistilBertForMaskedLM          | 64  | 0.6051 |  4.2714   |     8.2129     |     nan     | 29.0096  |        27.6761         |
|       RobertaForQuestionAnswering       | 128 | 1.5283 |   8.84    |    13.0443     |     nan     |  28.486  |        28.3453         |
|       AlbertForQuestionAnswering        |  4  | 1.4291 |  8.5193   |      nan       |     nan     | 28.3625  |        27.3942         |
|       BlenderbotSmallForCausalLM        | 64  | 0.8239 |  5.5965   |      nan       |     nan     | 28.1519  |        27.1049         |
|     DistilBertForQuestionAnswering      | 64  | 0.5867 |  4.3509   |     8.2981     |     nan     |  27.468  |        26.9617         |
|             OPTForCausalLM              | 32  |  1.3   |  8.1977   |      nan       |     nan     | 26.6569  |        26.3537         |
|               DistillGPT2               |  1  | 0.7203 |  3.9956   |     5.4409     |     nan     | 25.7546  |        29.5859         |
|                CamemBert                |  1  | 1.6303 |   8.926   |    12.0923     |     nan     | 25.5323  |        25.1641         |
|         Speech2Text2ForCausalLM         | 128 |  0.69  |  4.1683   |     6.7967     |     nan     | 23.0932  |        21.8021         |
|            PLBartForCausalLM            | 32  | 0.6399 |  4.2166   |     5.8815     |     nan     | 21.7613  |        21.2058         |
|          AllenaiLongformerBase          |  1  | 6.5978 |  17.1453  |    84.5909     |     nan     |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |   0.754   |      nan       |     nan     |  1.1305  |         1.5536         |
|            AlbertForMaskedLM            |  4  | 0.9998 |  0.7431   |      nan       |     nan     |  1.1078  |         1.5319         |
|             BartForCausalLM             |  4  |  1.0   |  0.8997   |      nan       |     nan     |  1.0943  |         1.1562         |
|      GPT2ForSequenceClassification      |  4  | 0.9675 |  0.9164   |      nan       |     nan     |  1.0779  |         1.1637         |
|           PegasusForCausalLM            | 32  | 0.9749 |  0.9114   |     0.4175     |     nan     |  1.0189  |         1.089          |
|        BertForQuestionAnswering         | 128 | 1.0008 |   0.952   |     0.3554     |     nan     |  1.0005  |         1.0676         |
|       RobertaForQuestionAnswering       | 128 | 1.0008 |   0.952   |     0.3554     |     nan     |  1.0005  |         1.0676         |
|       T5ForConditionalGeneration        |  4  | 0.9996 |  0.9527   |      nan       |     nan     |  0.995   |         1.2292         |
|    LayoutLMForSequenceClassification    | 16  | 1.004  |  0.9325   |     0.3632     |     nan     |  0.9943  |         1.0278         |
|       ElectraForQuestionAnswering       | 64  | 1.0016 |  0.9538   |     0.3384     |     nan     |  0.9938  |         1.0704         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.9035   |      nan       |     nan     |  0.9913  |         1.1976         |
|                 T5Small                 |  1  |  1.0   |  0.8935   |      nan       |     nan     |  0.9874  |          1.15          |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9238   |     0.3662     |     nan     |  0.9871  |         1.0263         |
|            MBartForCausalLM             | 32  |  1.0   |  0.8924   |      nan       |     nan     |  0.9868  |         1.0636         |
|             OPTForCausalLM              | 32  | 0.9996 |  0.8679   |      nan       |     nan     |  0.9838  |         1.0755         |
|             BertForMaskedLM             | 64  | 0.9996 |   0.899   |     0.3787     |     nan     |  0.9811  |         1.0366         |
|           RobertaForCausalLM            | 64  | 0.9991 |  0.8994   |     0.3788     |     nan     |  0.9801  |         1.0358         |
|            TrOCRForCausalLM             | 32  |  1.0   |  0.8921   |      nan       |     nan     |  0.9642  |         1.0376         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9999 |  0.8918   |      nan       |     nan     |  0.9593  |         1.1105         |
|          DistilBertForMaskedLM          | 64  | 0.9999 |  0.8599   |     0.3635     |     nan     |  0.948   |         1.0272         |
|         Speech2Text2ForCausalLM         | 128 | 0.9676 |  0.8427   |     0.3532     |     nan     |  0.946   |         1.0791         |
|      MBartForConditionalGeneration      | 16  |  1.0   |  0.8555   |      nan       |     nan     |  0.9335  |         1.0986         |
|           ElectraForCausalLM            | 32  | 0.9996 |   0.848   |     0.357      |     nan     |  0.9319  |         1.0177         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9996 |  0.8172   |      nan       |     nan     |  0.9269  |         1.0441         |
|            PLBartForCausalLM            | 32  | 1.0003 |  0.8444   |     0.3979     |     nan     |  0.9214  |         1.0168         |
|       MT5ForConditionalGeneration       |  8  | 0.919  |   0.83    |      nan       |     nan     |  0.919   |         0.919          |
|     PegasusForConditionalGeneration     | 16  | 0.9985 |   0.962   |     0.4377     |     nan     |  0.9159  |         1.0993         |
|     DistilBertForQuestionAnswering      | 64  | 1.0004 |  0.9216   |     0.3466     |     nan     |  0.9129  |         1.0128         |
|         MegatronBertForCausalLM         | 16  | 0.9998 |  0.8597   |     0.4044     |     nan     |  0.9036  |         1.0277         |
|    MegatronBertForQuestionAnswering     | 16  |  1.0   |  0.8529   |     0.411      |     nan     |  0.893   |         1.0093         |
|     PLBartForConditionalGeneration      | 16  | 0.9983 |  0.8769   |      nan       |     nan     |  0.8775  |         1.0294         |
|                 BigBird                 |  1  | 1.0008 |  0.9547   |     0.448      |     nan     |  0.8348  |         1.1049         |
|             XGLMForCausalLM             |  8  | 0.9918 |  0.9234   |      nan       |     nan     |  0.8333  |         1.0324         |
|               DistillGPT2               |  1  | 0.9963 |  0.8033   |     0.4019     |     nan     |  0.8228  |         1.0239         |
|                CamemBert                |  1  | 0.9989 |  0.8143   |     0.4161     |     nan     |  0.8157  |         0.9312         |
|            YituTechConvBert             |  1  | 0.9718 |  0.8091   |      nan       |     nan     |  0.8103  |         0.9318         |
|     M2M100ForConditionalGeneration      |  8  | 0.9967 |  0.9558   |     0.4308     |     nan     |  0.7739  |         1.0609         |
|          MobileBertForMaskedLM          | 32  | 0.9998 |  0.8864   |      nan       |     nan     |  0.6997  |         0.9454         |
|     MobileBertForQuestionAnswering      | 64  | 1.0153 |  0.9965   |      nan       |     nan     |  0.6085  |         0.8221         |
|           DebertaForMaskedLM            |  4  | 0.9982 |  0.9824   |     0.3623     |     nan     |  0.4498  |         1.1123         |
|       DebertaForQuestionAnswering       |  8  | 0.9754 |  1.0737   |     0.3252     |     nan     |  0.3361  |         1.1932         |
|          AllenaiLongformerBase          |  1  | 0.9977 |  0.9473   |     0.3844     |     nan     |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|      xcit_large_24_p8_224       |  5  | 1.0018 |    0.0    |      0.0       |     0.0     |  2.5138  |         1.8289         |
|        tnt_s_patch16_224        | 128 | 0.9999 |  0.9981   |      0.0       |   1.9506    |  2.1263  |         2.0891         |
|           regnety_002           | 128 | 0.9744 |  0.9297   |     1.1188     |    1.379    |  2.0693  |         1.443          |
|          ghostnet_100           | 128 | 1.0031 |  0.9955   |     0.9047     |   1.5366    |  2.0586  |         1.7338         |
|            lcnet_050            | 128 | 0.9657 |  0.9508   |     0.8505     |   1.6006    |  1.9979  |         1.6225         |
|        twins_pcpvt_base         | 64  | 1.0043 |   0.929   |     0.9247     |   1.3578    |  1.7285  |         1.6727         |
|        res2net101_26w_4s        | 64  | 1.0026 |  0.9912   |     0.9458     |   1.4055    |  1.5971  |         1.3453         |
|           volo_d1_224           | 64  | 0.9997 |  0.9941   |      0.0       |   1.1424    |  1.5964  |         1.5611         |
|            hrnet_w18            | 128 | 1.0027 |  1.0165   |     0.8612     |    1.462    |  1.5862  |         1.4754         |
|             dla102              | 128 |  1.0   |  0.9959   |     0.8367     |   1.4156    |  1.5807  |         1.5512         |
|          gmlp_s16_224           | 128 | 0.9997 |  0.9957   |      0.0       |   1.0462    |  1.5572  |         1.483          |
|          gmixer_24_224          | 128 | 0.9999 |  0.8806   |      0.0       |   0.9759    |  1.552   |         1.5077         |
|            nfnet_l0             | 128 | 0.9995 |  0.8102   |     0.7106     |   1.0396    |  1.5407  |         1.4638         |
|           resnest101e           | 64  | 0.9997 |  0.9913   |     0.8118     |   1.2515    |  1.5353  |         1.4285         |
|  swin_base_patch4_window7_224   | 64  | 0.9997 |  0.9614   |      0.0       |   1.0473    |  1.5177  |         1.5196         |
|       gluon_inception_v3        | 128 | 0.9999 |  0.9966   |     0.8529     |   1.1961    |  1.5086  |         1.4737         |
|        adv_inception_v3         | 128 |  1.0   |  0.9962   |     0.8534     |   1.1958    |  1.5061  |         1.4705         |
|          inception_v3           | 128 | 0.9999 |  0.9966   |     0.8525     |   1.1959    |  1.5011  |         1.4684         |
|           dm_nfnet_f0           | 128 | 0.9984 |  1.0002   |      0.0       |   1.1784    |   1.5    |         1.4274         |
|          cait_m36_384           |  4  | 1.0005 |    0.0    |      0.0       |     0.0     |  1.4668  |         1.4168         |
|        res2net50_14w_8s         | 128 |  1.0   |  0.9943   |     0.809      |   1.2818    |  1.4665  |         1.4075         |
|      mobilenetv3_large_100      | 128 | 0.956  |  0.9445   |     0.7814     |   1.3446    |  1.4602  |         1.4405         |
|            fbnetv3_b            | 128 | 0.9519 |  0.9425   |     0.7754     |   1.2585    |  1.4533  |         1.4099         |
|         crossvit_9_240          | 128 | 1.0001 |  0.9955   |     0.8377     |   1.0601    |  1.4527  |         1.4207         |
|           selecsls42b           | 128 | 0.9999 |  0.9953   |     0.8416     |   1.3602    |  1.4427  |         1.4143         |
|         coat_lite_mini          | 128 | 1.0002 |  0.9961   |     0.8471     |   1.2053    |  1.442   |         1.4073         |
|          resmlp_12_224          | 128 | 1.0002 |  0.9988   |     0.7822     |     0.0     |  1.4328  |         1.3808         |
|           mnasnet_100           | 128 | 0.9529 |  0.9444   |     0.7826     |   1.3715    |  1.4324  |         1.4573         |
|           res2next50            | 128 | 0.9994 |  0.9963   |     0.8299     |   1.2133    |  1.423   |         1.3521         |
|         mobilenetv2_100         | 128 | 0.9521 |  0.9416   |     0.7211     |   0.8669    |  1.4033  |         1.4338         |
|          jx_nest_base           | 32  | 0.9995 |  0.9918   |      0.0       |   1.2269    |  1.399   |         1.3677         |
|           mobilevit_s           | 64  | 0.9735 |  0.8144   |     0.6564     |   1.1126    |  1.3811  |         1.3686         |
|        ese_vovnet19b_dw         | 128 | 0.9705 |  0.9648   |     0.7668     |   1.2432    |  1.3783  |         1.3797         |
|          spnasnet_100           | 128 | 0.9451 |  0.9378   |     0.7758     |   1.3175    |  1.3679  |         1.3958         |
|            pit_b_224            | 64  | 0.9999 |  0.9959   |     0.8217     |   1.0632    |  1.3611  |         1.356          |
|           fbnetc_100            | 128 | 0.9524 |  0.9434   |     0.7898     |   1.3769    |  1.3555  |         1.3749         |
|           convit_base           | 64  | 1.0001 |  0.9963   |      0.0       |     0.0     |  1.3487  |         1.3738         |
|       tf_efficientnet_b0        | 128 | 0.9641 |  0.8075   |     0.6671     |    1.097    |  1.3481  |         1.3537         |
|         poolformer_m36          | 64  | 0.9998 |  0.9984   |     0.8061     |     0.0     |   1.33   |         1.2968         |
|          botnet26t_256          | 128 | 0.9801 |  0.9735   |     0.813      |   1.3482    |  1.3266  |         1.3346         |
|          cspdarknet53           | 64  | 0.9422 |  0.9335   |     0.7555     |   0.8991    |  1.3197  |         1.3472         |
|          pnasnet5large          | 16  | 1.0064 |   1.029   |     0.8464     |    1.141    |  1.2979  |         1.2717         |
|          mixer_b16_224          | 128 | 1.0002 |  0.9981   |     0.803      |   0.9424    |  1.2925  |         1.273          |
|       eca_botnext26ts_256       | 128 | 0.9805 |  0.8116   |     0.6713     |   1.1588    |  1.2898  |         1.2888         |
|      beit_base_patch16_224      | 64  | 0.9999 |   0.979   |      0.0       |   1.0454    |  1.2856  |         1.2679         |
| deit_base_distilled_patch16_224 | 64  | 0.9999 |  0.9915   |     0.797      |   1.0625    |  1.2816  |         1.259          |
|           rexnet_100            | 128 | 0.9646 |  0.8617   |     0.6894     |   1.0374    |  1.2793  |         1.2779         |
|            tinynet_a            | 128 | 0.9573 |   0.803   |     0.6597     |   1.0815    |  1.2604  |         1.2685         |
|         visformer_small         | 128 | 0.9998 |  1.0019   |     0.8358     |   1.0878    |  1.237   |         1.1852         |
|        sebotnet33ts_256         | 64  | 0.9665 |  0.8377   |      0.68      |    1.117    |  1.2153  |         1.2087         |
|           tf_mixnet_l           | 128 | 0.9806 |  0.9094   |     0.7951     |   1.0607    |  1.1977  |         1.1912         |
|      vit_base_patch16_224       | 64  | 0.9999 |  0.9942   |     0.8346     |    0.994    |  1.1906  |         1.1831         |
|            mixnet_l             | 128 | 0.9799 |  0.9055   |     0.7956     |   1.0635    |  1.1813  |         1.1788         |
|        gluon_xception65         | 32  | 0.9995 |  0.9891   |     0.7528     |   1.0657    |  1.161   |         1.1259         |
|             dpn107              | 32  | 0.9406 |  0.9253   |     0.7475     |   0.9913    |  1.1584  |         1.1784         |
|     swsl_resnext101_32x16d      | 32  | 0.9997 |  0.9823   |     0.8058     |   1.0769    |   1.14   |         1.0574         |
|            repvgg_a2            | 128 | 0.944  |  0.9335   |     0.7984     |   1.1271    |  1.1395  |         1.1567         |
|            gernet_l             | 128 | 0.9466 |  0.9376   |     0.767      |   1.1446    |  1.0663  |         1.0791         |
|        convmixer_768_32         | 32  | 0.9999 |  0.9974   |     0.9228     |   1.0532    |  1.0563  |         1.0508         |
|          convnext_base          | 64  | 0.9994 |  0.9938   |      0.0       |   1.2022    |  0.6576  |         0.6429         |
|        eca_halonext26ts         | 128 | 0.9816 |  0.8173   |     0.678      |   1.1505    |   0.0    |          0.0           |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+-------------+----------------+---------------+---------------+------------------------+
|              name               | bs | eager |  aot_eager  | aot_cudagraphs |  aot_nvfuser  |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+-------------+----------------+---------------+---------------+------------------------+
|        adv_inception_v3         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          convnext_base          | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|           res2next50            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           rexnet_100            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        sebotnet33ts_256         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           selecsls42b           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            tinynet_a            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        twins_pcpvt_base         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|         visformer_small         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|      beit_base_patch16_224      | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          gmixer_24_224          | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|          gmlp_s16_224           | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|          jx_nest_base           | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|           volo_d1_224           | 2  | pass  |    pass     |  fail_to_run   |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 2  | pass  |    pass     |      pass      |  fail_to_run  |     pass      |          pass          |
|           convit_base           | 2  | pass  |    pass     |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|          cait_m36_384           | 2  | pass  | fail_to_run |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|      xcit_large_24_p8_224       | 2  | pass  | fail_to_run |  fail_to_run   |  fail_to_run  |     pass      |          pass          |
|         poolformer_m36          | 2  | pass  |    pass     |      pass      | fail_accuracy |     pass      |          pass          |
|           resnest101e           | 2  | pass  |    pass     |      pass      | fail_accuracy |     pass      |          pass          |
|          botnet26t_256          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            repvgg_a2            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|         coat_lite_mini          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          cspdarknet53           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|             dla102              | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|             dpn107              | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           fbnetc_100            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            gernet_l             | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           regnety_002           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          ghostnet_100           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            hrnet_w18            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          inception_v3           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            lcnet_050            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          mixer_b16_224          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            mixnet_l             | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           mnasnet_100           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|         mobilenetv2_100         | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|           mobilevit_s           | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            nfnet_l0             | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|            pit_b_224            | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|          pnasnet5large          | 2  | pass  |    pass     |      pass      |     pass      |     pass      |          pass          |
|        eca_halonext26ts         | 2  | pass  |    pass     |      pass      |     pass      |  fail_to_run  |      fail_to_run       |
|        gluon_xception65         | 2  | pass  |    pass     |      pass      |     pass      | fail_accuracy |     fail_accuracy      |
|            fbnetv3_b            | 2  | pass  |    pass     |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
|          spnasnet_100           | 2  | pass  |    pass     |      pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+-------------+----------------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|        twins_pcpvt_base         | 64  | 2.7886 |  19.2789  |    30.8811     |   71.6398   | 450.4032 |        449.0663        |
|         coat_lite_mini          | 128 | 1.1898 |   6.925   |    10.2699     |   32.4009   | 415.0195 |        407.8104        |
|           mobilevit_s           | 64  | 1.8334 |  9.5546   |    18.1252     |   63.5064   | 335.9776 |        326.7777        |
|        sebotnet33ts_256         | 64  | 1.7766 |  7.7854   |    16.9464     |   67.7216   | 320.9297 |        322.9218        |
|       eca_botnext26ts_256       | 128 | 1.4388 |  6.4959   |    12.5318     |   61.6816   | 263.1896 |        256.6624        |
|      xcit_large_24_p8_224       |  5  | 3.2266 |    nan    |      nan       |     nan     | 190.9474 |        191.3842        |
|          botnet26t_256          | 128 | 1.3802 |  5.9108   |    11.4357     |   49.5446   | 190.114  |        192.5791        |
|  swin_base_patch4_window7_224   | 64  | 2.9328 |  16.2208  |      nan       |   70.4276   | 184.8036 |        181.5733        |
|          jx_nest_base           | 32  | 1.7719 |  12.5102  |      nan       |   50.3768   | 174.1079 |        171.8631        |
|          convnext_base          | 64  | 1.4105 |  8.8881   |      nan       |   35.5731   | 156.2029 |        155.9408        |
|          cait_m36_384           |  4  | 3.3274 |    nan    |      nan       |     nan     | 149.7085 |        146.6911        |
|            hrnet_w18            | 128 | 6.1479 |  40.5858  |    71.0524     |  452.9955   | 128.3186 |        122.6451        |
|         crossvit_9_240          | 128 | 1.6769 |  11.0596  |    16.4989     |   36.3303   | 122.6924 |        123.3807        |
|           resnest101e           | 64  | 3.4105 |  21.9181  |    33.9868     |  106.6682   | 122.1278 |        123.3805        |
|           volo_d1_224           | 64  | 1.3075 |  9.8414   |      nan       |   38.6123   | 104.5584 |        101.7564        |
|          pnasnet5large          | 16  | 4.8228 |  29.9114  |    50.9646     |  188.9162   | 101.2345 |        98.7592         |
|         visformer_small         | 128 | 0.9721 |  5.4105   |     8.2006     |   30.7881   | 93.9667  |        93.7884         |
|            pit_b_224            | 64  | 1.1349 |  6.8399   |    10.4533     |   25.4554   | 88.4051  |        87.0922         |
|          gmlp_s16_224           | 128 | 1.2469 |  9.6414   |      nan       |   22.149    | 77.7271  |        75.0659         |
|        res2net101_26w_4s        | 64  | 3.2425 |  22.1744  |    35.0968     |  118.4702   | 67.1696  |        62.6511         |
|        tnt_s_patch16_224        | 128 | 2.0086 |  14.8546  |      nan       |   37.5385   | 62.7691  |        59.8617         |
|        res2net50_14w_8s         | 128 | 2.9692 |  19.6685  |    30.6779     |  136.2903   | 60.9233  |        58.4898         |
|          gmixer_24_224          | 128 | 1.4329 |  10.6824  |      nan       |   28.1379   | 60.2775  |        58.6871         |
|           convit_base           | 64  | 1.2215 |  7.9288   |      nan       |     nan     | 57.9251  |        56.8513         |
|        gluon_xception65         | 32  | 2.1486 |  14.5975  |    21.7491     |   64.2962   | 54.6983  |        51.7244         |
|         poolformer_m36          | 64  | 1.9552 |  11.256   |    17.2506     |     nan     | 51.0105  |        48.5998         |
|     swsl_resnext101_32x16d      | 32  | 1.8227 |  12.6355  |     18.686     |   52.7151   | 47.8203  |        45.3611         |
|             dpn107              | 32  | 4.0616 |   17.56   |     50.857     |  101.5033   | 47.3077  |        45.4352         |
|          resmlp_12_224          | 128 | 0.7259 |   3.948   |     7.9433     |     nan     | 45.4439  |         40.973         |
|            fbnetv3_b            | 128 | 3.3446 |  14.3348  |    36.4534     |   99.9885   | 44.8979  |        41.3769         |
| deit_base_distilled_patch16_224 | 64  | 0.9803 |  6.0124   |     8.8429     |   14.5051   |  43.952  |         43.516         |
|      vit_base_patch16_224       | 64  | 0.9723 |  6.2255   |     8.905      |   14.0467   | 42.9831  |        42.7723         |
|          mixer_b16_224          | 128 | 0.9758 |  4.8173   |     8.0926     |   16.2722   | 41.0523  |        40.3492         |
|       gluon_inception_v3        | 128 | 1.6418 |  11.4686  |    17.0664     |   97.0568   | 40.4963  |         38.521         |
|            mixnet_l             | 128 | 5.4487 |  15.1933  |    30.9701     |   87.7111   | 40.1174  |        36.4016         |
|        adv_inception_v3         | 128 | 1.6324 |  11.5373  |    17.2605     |   98.9266   | 39.9888  |        37.6633         |
|          inception_v3           | 128 | 1.6635 |  11.4382  |    17.4671     |   98.1895   | 39.8173  |        37.7171         |
|      beit_base_patch16_224      | 64  | 1.2378 |  6.9376   |      nan       |   18.6737   | 39.6996  |        37.3079         |
|           tf_mixnet_l           | 128 | 5.7701 |  15.5782  |    32.1528     |   87.1081   | 39.5924  |        37.4604         |
|             dla102              | 128 | 1.8418 |  12.9528  |    19.5944     |   86.5435   | 38.8311  |        36.0248         |
|        convmixer_768_32         | 32  | 1.2436 |  8.6071   |    12.3605     |   17.559    | 38.2188  |         36.946         |
|          ghostnet_100           | 128 | 3.0034 |  12.3307  |    17.2161     |   90.2264   | 38.0134  |        36.3602         |
|           res2next50            | 128 | 1.6145 |  10.8424  |    16.4392     |   84.3658   | 34.8033  |        33.5014         |
|           dm_nfnet_f0           | 128 | 2.1292 |  9.0767   |      nan       |   38.1385   | 34.1963  |         32.037         |
|           rexnet_100            | 128 | 1.9563 |  9.4335   |    20.6214     |  117.2395   | 31.6527  |        30.2589         |
|            tinynet_a            | 128 | 2.109  |  10.2195  |    23.9285     |   78.7993   | 31.0464  |        29.3305         |
|          cspdarknet53           | 64  | 2.3291 |  9.5562   |    22.8968     |   40.8709   | 28.9134  |        26.8114         |
|       tf_efficientnet_b0        | 128 | 1.8966 |  8.7482   |    19.4162     |   78.0265   | 27.5474  |        26.2992         |
|            nfnet_l0             | 128 | 1.9244 |  9.0907   |    13.0128     |   34.939    | 26.3739  |        24.8758         |
|           fbnetc_100            | 128 | 2.1772 |  8.3721   |    21.0565     |   59.8188   | 26.2984  |        24.6009         |
|          spnasnet_100           | 128 | 2.1478 |  8.2724   |    20.1714     |   57.1241   | 25.5721  |        24.1208         |
|      mobilenetv3_large_100      | 128 | 1.8146 |  7.2825   |    15.7371     |   82.3897   |  24.112  |        22.9963         |
|            repvgg_a2            | 128 | 2.0068 |  7.8114   |    17.9164     |   61.7912   | 21.9794  |        20.1666         |
|         mobilenetv2_100         | 128 | 1.7899 |   6.804   |    15.4585     |   40.3552   | 21.9682  |        20.8764         |
|            gernet_l             | 128 | 1.9591 |  7.9609   |    18.4196     |   44.163    | 21.7658  |        20.5759         |
|           regnety_002           | 128 | 1.6211 |  7.3888   |     16.13      |   56.2891   | 21.4296  |         21.058         |
|           mnasnet_100           | 128 | 1.7603 |  7.0147   |    16.5129     |   50.5861   |  21.34   |        20.5011         |
|           selecsls42b           | 128 | 0.8253 |  5.0917   |     7.5991     |   50.2702   | 18.9059  |        18.1065         |
|            lcnet_050            | 128 | 1.061  |  4.4009   |     8.5809     |   38.2764   | 15.3221  |        14.9451         |
|        ese_vovnet19b_dw         | 128 | 1.0646 |  4.1179   |     7.956      |   39.0037   | 14.9559  |        14.0328         |
|        eca_halonext26ts         | 128 | 1.4734 |  6.4076   |    13.5735     |   65.8208   |   nan    |          nan           |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | aot_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+
|            tinynet_a            | 128 | 0.9889 |  0.7884   |     0.2764     |   0.7887    |  1.3707  |         1.4015         |
|          gmixer_24_224          | 128 | 0.9926 |  0.9699   |      nan       |   0.9029    |  1.3139  |         1.3772         |
|          gmlp_s16_224           | 128 | 0.9938 |  0.9715   |      nan       |   0.9188    |  1.2841  |         1.2998         |
|       tf_efficientnet_b0        | 128 | 0.9882 |  0.7693   |     0.2664     |   0.8392    |  1.173   |         1.1918         |
|          pnasnet5large          | 16  | 1.0575 |  0.9913   |     0.3634     |   1.1722    |  1.1607  |         1.2789         |
|           mobilevit_s           | 64  | 0.9931 |  0.7669   |     0.2734     |   0.7848    |  1.1578  |         1.2186         |
|           rexnet_100            | 128 | 0.9885 |   0.785   |     0.2849     |   0.8648    |  1.1475  |         1.1687         |
|       eca_botnext26ts_256       | 128 | 0.9886 |   0.77    |     0.2669     |    0.776    |  1.1068  |         1.2101         |
|         poolformer_m36          | 64  | 0.9979 |  0.9432   |     0.3413     |     nan     |  1.1021  |         1.1162         |
|        tnt_s_patch16_224        | 128 | 0.9945 |  0.9729   |      nan       |   0.9418    |  1.0703  |         1.1492         |
|           resnest101e           | 64  | 0.995  |  0.9889   |     0.3473     |   0.9685    |  1.0556  |         1.0626         |
|           convit_base           | 64  | 0.9966 |  0.8516   |      nan       |     nan     |  1.0528  |         1.1534         |
|           volo_d1_224           | 64  | 0.9965 |  0.9475   |      nan       |   0.8587    |  1.0379  |         1.1081         |
|           dm_nfnet_f0           | 128 | 0.969  |   0.898   |      nan       |   0.9443    |  1.0336  |         1.124          |
|            nfnet_l0             | 128 | 0.9884 |  0.8173   |     0.2681     |   0.8142    |  1.0333  |         1.0762         |
|         mobilenetv2_100         | 128 | 0.9863 |  0.7642   |     0.3109     |   0.9129    |  1.0048  |         1.021          |
|      beit_base_patch16_224      | 64  | 0.9952 |  0.9327   |      nan       |   0.9298    |  1.0004  |         1.0447         |
|            pit_b_224            | 64  | 0.999  |  0.8053   |     0.326      |   0.8179    |  0.9746  |         1.2067         |
|        convmixer_768_32         | 32  | 0.9972 |  0.9788   |     0.3455     |   0.9714    |  0.9746  |         0.9788         |
|        twins_pcpvt_base         | 64  | 0.9945 |  0.9232   |     0.3403     |    0.802    |  0.9699  |         1.0818         |
|            fbnetv3_b            | 128 | 0.9872 |  0.7836   |     0.315      |    0.79     |  0.9645  |         0.9776         |
|          ghostnet_100           | 128 | 0.9756 |   0.87    |     0.337      |   0.9026    |  0.9489  |         0.9832         |
|             dla102              | 128 | 0.9694 |   0.912   |     0.3362     |   0.9381    |  0.9431  |         0.9502         |
|         visformer_small         | 128 | 0.9899 |  0.9259   |     0.3469     |   0.8884    |  0.9382  |         1.0521         |
|      xcit_large_24_p8_224       |  5  | 0.9975 |    nan    |      nan       |     nan     |  0.9319  |         0.9931         |
|           tf_mixnet_l           | 128 | 0.991  |  0.8555   |     0.2875     |   0.8365    |  0.9314  |         1.0486         |
|          cait_m36_384           |  4  | 0.9998 |    nan    |      nan       |     nan     |  0.929   |         0.9775         |
|     swsl_resnext101_32x16d      | 32  | 0.9989 |   0.879   |     0.3676     |   0.8487    |  0.9112  |         0.9354         |
|          mixer_b16_224          | 128 | 0.992  |  0.9574   |     0.3472     |   0.7555    |  0.9089  |         0.9818         |
|             dpn107              | 32  | 0.997  |  0.9097   |     0.3531     |   0.8814    |  0.9072  |         0.9596         |
|            hrnet_w18            | 128 | 0.9914 |  0.9176   |     0.3348     |   0.8581    |  0.8969  |         0.938          |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9151   |     0.3336     |   0.8524    |  0.8964  |         0.9224         |
|      mobilenetv3_large_100      | 128 | 0.9772 |   0.84    |     0.3302     |   0.8641    |  0.8948  |         0.916          |
|           selecsls42b           | 128 | 0.9789 |   0.876   |     0.3528     |   0.8772    |  0.8927  |         0.9188         |
|        gluon_xception65         | 32  | 0.9955 |  0.8859   |     0.3349     |   0.8854    |  0.8924  |         0.8971         |
|      vit_base_patch16_224       | 64  | 0.9955 |  0.9342   |     0.3593     |   0.8801    |  0.8916  |         0.8968         |
| deit_base_distilled_patch16_224 | 64  | 0.9944 |  0.9332   |     0.359      |   0.8794    |  0.8911  |         0.8966         |
|        ese_vovnet19b_dw         | 128 | 0.9858 |  0.8566   |     0.3273     |   0.9146    |  0.8905  |         0.9028         |
|          convnext_base          | 64  | 1.003  |  0.9263   |      nan       |   0.7349    |  0.8852  |         0.9866         |
|        adv_inception_v3         | 128 | 0.9824 |  0.8621   |     0.3343     |   0.8538    |  0.8845  |         0.8998         |
|       gluon_inception_v3        | 128 | 0.9824 |  0.8621   |     0.3343     |   0.8538    |  0.8845  |         0.8998         |
|          inception_v3           | 128 | 0.9824 |  0.8621   |     0.3343     |   0.8538    |  0.8845  |         0.8998         |
|        res2net50_14w_8s         | 128 | 0.9908 |  0.9072   |     0.3232     |   0.8299    |  0.876   |         0.9007         |
|           res2next50            | 128 | 0.9913 |   0.91    |     0.3202     |   0.8285    |  0.8697  |         0.8972         |
|            mixnet_l             | 128 | 0.9902 |  0.8441   |     0.2718     |   0.7737    |  0.8653  |         0.9722         |
|            gernet_l             | 128 | 0.9794 |  0.8503   |     0.3444     |   0.8158    |  0.862   |         0.8897         |
|          spnasnet_100           | 128 | 0.9788 |  0.8801   |     0.3343     |   0.8371    |  0.8602  |         0.8784         |
|          cspdarknet53           | 64  | 0.9913 |  0.8405   |     0.3241     |   0.7908    |  0.8512  |         0.8583         |
|          botnet26t_256          | 128 | 0.9849 |   0.864   |     0.3308     |   0.7708    |  0.8503  |         0.898          |
|           mnasnet_100           | 128 | 0.9765 |  0.8701   |     0.3349     |   0.8252    |  0.8503  |         0.8698         |
|           fbnetc_100            | 128 |  0.98  |  0.8491   |     0.3307     |   0.7352    |  0.8387  |         0.8542         |
|            lcnet_050            | 128 | 0.9433 |  0.7566   |     0.3361     |   0.7559    |  0.8309  |         0.8769         |
|           regnety_002           | 128 | 0.9504 |  0.7948   |     0.3403     |   0.7515    |  0.8245  |         0.8627         |
|         crossvit_9_240          | 128 | 0.9854 |  0.8707   |     0.3347     |   0.8842    |  0.8174  |         1.0986         |
|          resmlp_12_224          | 128 | 0.9827 |  0.9508   |     0.2624     |     nan     |  0.8092  |         0.8236         |
|         coat_lite_mini          | 128 | 1.0338 |  0.9202   |     0.3515     |   0.6593    |  0.8006  |         1.035          |
|            repvgg_a2            | 128 | 0.9767 |  0.7822   |     0.3407     |   0.6789    |  0.7903  |         0.8279         |
|  swin_base_patch4_window7_224   | 64  | 0.9966 |  0.9203   |      nan       |   0.8451    |  0.7566  |         0.9252         |
|        sebotnet33ts_256         | 64  | 0.9928 |  0.7073   |     0.3212     |   0.7354    |  0.745   |         0.8293         |
|          jx_nest_base           | 32  | 0.9983 |  0.8927   |      nan       |    0.86     |  0.6708  |         0.8619         |
|        eca_halonext26ts         | 128 | 0.9886 |  0.7747   |     0.267      |   0.7762    |   nan    |          nan           |
+---------------------------------+-----+--------+-----------+----------------+-------------+----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

@anijain2305
Copy link
Contributor Author

Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 82%, 46/56 | 100%, 43/43 | 59%, 36/61  |
|       aot_eager        | 79%, 44/56 | 100%, 43/43 | 56%, 34/61  |
|     aot_cudagraphs     | 64%, 36/56 | 49%, 21/43  |  11%, 7/61  |
|    nvprims_nvfuser     | 48%, 27/56 |  0%, 0/43   |  15%, 9/61  |
|        inductor        | 71%, 40/56 | 93%, 40/43  | 56%, 34/61  |
| inductor_no_cudagraphs | 79%, 44/56 | 93%, 40/43  | 56%, 34/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.01x    |    1.00x    |
|       aot_eager        |   1.01x    |    1.00x    |    1.00x    |
|     aot_cudagraphs     |   1.05x    |    1.02x    |    1.00x    |
|    nvprims_nvfuser     |   1.04x    |    0.0x     |    1.16x    |
|        inductor        |   1.39x    |    1.29x    |    1.23x    |
| inductor_no_cudagraphs |   1.22x    |    1.21x    |    1.23x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    1.84    |    2.26     |    1.68     |
|       aot_eager        |    7.30    |    10.27    |    10.56    |
|     aot_cudagraphs     |    9.57    |    20.71    |    12.53    |
|    nvprims_nvfuser     |   48.11    |     0.0     |   163.13    |
|        inductor        |   25.45    |    35.22    |    45.24    |
| inductor_no_cudagraphs |   25.56    |    30.09    |    43.82    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.95x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.87x    |    0.91x    |    0.90x    |
|     aot_cudagraphs     |   0.39x    |    0.36x    |    0.31x    |
|    nvprims_nvfuser     |   0.81x    |    0.0x     |    0.85x    |
|        inductor        |   0.81x    |    0.71x    |    0.95x    |
| inductor_no_cudagraphs |   0.93x    |    0.96x    |    1.01x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------------+-----------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-----------------+----------+------------------------+
|            densenet121            |  4   | 1.0065 |  0.9994   |     1.7788     |     0.6951      |  4.2215  |         1.4165         |
|      timm_vision_transformer      |  8   | 1.0049 |   0.917   |     1.513      |       0.0       |  2.6276  |         1.3953         |
|       functorch_dp_cifar10        |  64  | 0.9948 |   0.95    |     1.4724     |       0.0       |  2.5679  |         1.3391         |
|          pytorch_struct           | 200  | 0.987  |  0.7373   |     0.9244     |     0.8028      |  1.816   |         1.146          |
|           lennard_jones           | 1000 | 0.9577 |  0.8263   |     1.0156     |     0.6741      |  1.7521  |         0.9432         |
|        mobilenet_v3_large         |  32  | 1.0079 |  1.1149   |     0.9325     |     0.8735      |  1.7329  |         1.4215         |
|             hf_Albert             |  8   | 1.0014 |  0.9976   |     0.7522     |       0.0       |  1.6495  |         1.6428         |
|          resnext50_32x4d          |  8   | 1.0015 |  1.1296   |     0.9759     |     0.7562      |  1.6427  |         1.3294         |
|        shufflenet_v2_x1_0         | 128  | 0.9983 |  1.0146   |     0.7677     |     0.9192      |  1.5531  |         1.4015         |
|        speech_transformer         |  32  | 1.0079 |  0.9259   |     1.5245     |       0.0       |  1.5424  |         1.5415         |
|           timm_resnest            |  32  | 0.9993 |  1.0014   |     0.8048     |     1.1876      |  1.5178  |         1.4529         |
|              hf_GPT2              |  4   | 1.0106 |  0.9799   |     0.7392     |     0.4039      |  1.5023  |         1.5005         |
|            timm_nfnet             | 128  |  1.0   |  0.9998   |      0.0       |     1.2567      |  1.4758  |         1.4237         |
|             resnet18              |  16  | 1.0039 |  1.1001   |     0.8931     |     0.8823      |  1.4715  |         1.2451         |
|    mobilenet_v2_quantized_qat     |  96  | 1.0013 |  0.9747   |      0.0       |     1.4397      |  1.4333  |         1.4305         |
|           mobilenet_v2            |  96  | 0.9999 |    1.0    |     0.7315     |       0.0       |  1.4304  |         1.4022         |
|           fastNLP_Bert            |  6   | 0.9984 |  0.9764   |     0.7533     |       0.0       |  1.4277  |         1.3965         |
|         soft_actor_critic         | 256  | 0.9781 |  0.7691   |     1.0582     |     0.6291      |  1.4205  |         0.8968         |
|            hf_T5_large            |  2   | 1.0229 |  0.8542   |      0.0       |       0.0       |  1.3981  |         1.3967         |
|      resnet50_quantized_qat       |  32  | 1.0007 |  0.9582   |      0.0       |     1.2034      |  1.383   |         1.3819         |
|            mnasnet1_0             |  32  | 0.9995 |  1.0675   |     0.812      |     0.9628      |  1.3794  |         1.2884         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9944 |  0.9496   |     0.9673     |     0.8159      |  1.3694  |         1.2645         |
|           squeezenet1_1           |  32  | 0.9954 |   1.011   |     0.8215     |     0.7852      |  1.3678  |         1.3026         |
|               dcgan               |  32  | 0.9818 |  1.0022   |     1.0001     |     0.7516      |  1.3119  |         1.0534         |
|              hf_Bart              |  4   | 1.0107 |  0.9698   |      0.0       |       0.0       |  1.2663  |         1.2053         |
|              hf_Bert              |  4   | 1.031  |  0.9917   |     0.737      |       0.0       |   1.21   |         1.1902         |
|          LearningToPaint          |  96  | 0.9992 |  1.0012   |     0.809      |     1.0171      |  1.2093  |         1.1738         |
|             resnet50              |  32  | 0.9991 |  0.9882   |     0.7607     |     1.0841      |  1.2052  |         1.1682         |
|           pytorch_unet            |  1   | 0.9996 |  0.9979   |     0.8464     |     1.0893      |  1.1984  |         1.1881         |
|            Super_SloMo            |  6   | 0.9997 |  0.9977   |     0.8665     |     1.0026      |  1.1802  |         1.1661         |
|           hf_DistilBert           |  8   | 0.9999 |  0.9538   |     0.6855     |       0.0       |  1.1761  |         1.1816         |
|               vgg16               |  64  | 0.9998 |   0.999   |     0.8583     |     0.9982      |  1.1725  |         1.1669         |
|              alexnet              | 128  | 0.9993 |  0.9981   |     0.8032     |     1.0022      |  1.1618  |         1.1634         |
|        Background_Matting         |  4   | 0.9999 |  1.0215   |     0.863      |     1.0826      |  1.1186  |          1.11          |
|          pytorch_stargan          |  16  | 0.999  |  0.9836   |     0.8572     |       0.0       |  1.1156  |         1.0957         |
|            hf_Reformer            |  4   | 0.9963 |    0.0    |     0.9199     |       0.0       |  1.105   |         1.1283         |
|              yolov3               |  16  | 0.9997 |   0.995   |     0.7934     |     1.1952      |  1.095   |         1.0821         |
|            hf_BigBird             |  2   | 0.9905 |  0.9386   |     0.9544     |       0.0       |  1.0936  |         0.998          |
| attention_is_all_you_need_pytorch | 256  | 1.0001 |   0.971   |      0.0       |       0.0       |  1.0661  |         1.0524         |
|   timm_vision_transformer_large   |  8   | 0.9998 |  0.9953   |      0.0       |       0.0       |  1.0459  |         1.0329         |
|            tts_angular            |  64  | 0.9881 |  0.9561   |     0.9839     |     0.9734      |  1.0082  |         1.002          |
|              demucs               |  4   | 0.9996 |  0.9999   |     0.9999     |     1.0002      |  0.9994  |         0.9997         |
|      nvidia_deeprecommender       | 256  | 0.9983 |  0.9627   |     0.584      |      0.858      |  0.904   |         0.9639         |
|               dlrm                | 2048 |  0.0   |    0.0    |      0.0       |      1.09       |   0.0    |         1.0697         |
|               hf_T5               |  8   | 1.0015 |  0.8183   |      0.0       |       0.0       |   0.0    |         1.1069         |
|             tacotron2             |  64  | 0.9725 |  0.8235   |      0.0       |       0.0       |   0.0    |         0.8981         |
|           hf_GPT2_large           |  4   | 1.0003 |  0.9806   |      0.0       |       0.0       |   0.0    |         1.4766         |
|           hf_Longformer           |  2   | 0.9625 |   0.883   |     0.8164     |       0.0       |   0.0    |          0.0           |
|           BERT_pytorch            |  0   |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|          DALLE2_pytorch           |  0   |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|                drq                |  0   |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|               moco                |  0   |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|         timm_efficientdet         |  0   |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|         timm_efficientnet         |  0   |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|            timm_regnet            |  0   |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|            timm_vovnet            |  0   |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------------+-----------------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |  aot_cudagraphs  | nvprims_nvfuser  |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  2  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_BigBird             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           timm_resnest            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |          pass          |
|            timm_nfnet             |  2  |       pass       |       pass       |   fail_to_run    |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|       functorch_dp_cifar10        |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|             hf_Albert             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|              hf_Bert              |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|           hf_DistilBert           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|         timm_efficientnet         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|        speech_transformer         |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
|      timm_vision_transformer      |  2  |       pass       |       pass       |       pass       |   fail_to_run    |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|              hf_Bart              |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|               hf_T5               |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            hf_T5_base             |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|         timm_efficientdet         |  2  |       pass       |       pass       |   fail_to_run    |   fail_to_run    |       pass       |          pass          |
|            timm_regnet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        Background_Matting         |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  2  |       pass       |       pass       |       pass       |       pass       |       pass       |          pass          |
|               moco                |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|              yolov3               |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          vision_maskrcnn          |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|             tacotron2             |  2  |       pass       |       pass       |       pass       |   fail_to_run    |   fail_to_run    |          pass          |
|           BERT_pytorch            |  2  |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|           hf_Longformer           |  2  |       pass       |       pass       |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|      resnet50_quantized_qat       |  2  |       pass       |       pass       |   fail_to_run    |       pass       |  fail_accuracy   |     fail_accuracy      |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |  fail_accuracy   |   fail_to_run    |  fail_accuracy   |  fail_accuracy   |     fail_accuracy      |
|          DALLE2_pytorch           |  0  |      0.0000      |      0.0000      |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                drq                |  0  |      0.0000      |      0.0000      |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------------+-----------------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------------+-----------------+----------+------------------------+
|              yolov3               |  16  | 2.7848  |  9.5886   |    12.9709     |     117.076     | 368.8999 |        367.7649        |
|            hf_T5_large            |  2   | 13.5072 |  47.4256  |      nan       |       nan       | 125.744  |        124.048         |
|           timm_resnest            |  32  |  0.542  |  2.9033   |     4.0615     |     53.3697     | 67.9221  |        68.6143         |
|   timm_vision_transformer_large   |  8   | 2.2759  |  15.9163  |      nan       |       nan       | 67.0906  |        64.7414         |
| attention_is_all_you_need_pytorch | 256  | 1.1076  |  8.1934   |      nan       |       nan       | 56.8414  |         55.306         |
|      timm_vision_transformer      |  8   | 0.7851  |  4.8258   |     6.5654     |       nan       | 50.9499  |        51.2718         |
|          pytorch_stargan          |  16  | 0.3697  |  2.6485   |     3.4622     |       nan       | 49.2674  |        48.7708         |
|            densenet121            |  4   | 2.0651  |  14.8471  |    22.0861     |    201.9622     | 46.5356  |        45.8812         |
|            hf_BigBird             |  2   | 7.4195  |  14.5841  |    30.2365     |       nan       | 41.1527  |        27.3469         |
|          pytorch_struct           | 200  | 0.2409  |  0.8894   |     1.5967     |     5.5141      | 36.7635  |        36.8974         |
|      resnet50_quantized_qat       |  32  | 1.1107  |  10.0884  |      nan       |    173.4417     |  32.221  |         32.275         |
|              hf_Bart              |  4   | 1.4184  |   9.017   |      nan       |       nan       |  31.849  |        30.1628         |
|        mobilenet_v3_large         |  32  | 0.8483  |  5.3585   |     7.5099     |     96.0824     | 30.4483  |        29.4493         |
|            timm_nfnet             | 128  | 2.0146  |   8.503   |      nan       |    162.6666     | 29.8705  |        29.2796         |
|           fastNLP_Bert            |  6   | 1.4754  |  7.4864   |    11.0104     |       nan       | 29.1192  |        27.8205         |
|    mobilenet_v2_quantized_qat     |  96  | 1.2546  |  10.1663  |      nan       |    196.7506     | 28.4796  |         28.487         |
|        speech_transformer         |  32  | 1.5911  |  9.5136   |    58.1918     |       nan       | 28.2649  |        27.4044         |
|            hf_Reformer            |  4   | 2.4185  |    nan    |     9.8505     |       nan       | 27.7055  |        22.0631         |
|            mnasnet1_0             |  32  | 0.7518  |  4.9919   |     6.9672     |     66.0724     | 23.3112  |        21.8372         |
|          resnext50_32x4d          |  8   | 0.8349  |  5.4511   |     7.5616     |     67.8433     | 22.7538  |        20.6724         |
|             hf_Albert             |  8   | 1.0435  |  6.7164   |     9.5072     |       nan       | 22.3824  |        21.7073         |
|             resnet50              |  32  | 0.8084  |  5.6553   |     7.443      |      73.0       | 22.0201  |        22.0886         |
|              hf_Bert              |  4   | 1.3516  |   7.009   |     9.7999     |       nan       | 20.7511  |         20.063         |
|              hf_GPT2              |  4   |  1.246  |  6.8805   |    10.0977     |     73.9618     | 20.5855  |        20.1218         |
|        shufflenet_v2_x1_0         | 128  | 0.8815  |  6.1806   |     8.3016     |     83.2298     | 19.5393  |        18.1063         |
|            Super_SloMo            |  6   | 0.9893  |  5.3455   |     7.0804     |     34.2139     |  17.801  |        17.1429         |
|        Background_Matting         |  4   | 0.8287  |  5.2383   |     7.7041     |     58.6919     | 17.7819  |        16.3323         |
|           mobilenet_v2            |  96  | 0.7376  |  5.1021   |     7.2056     |       nan       | 17.2772  |        17.5484         |
|       functorch_dp_cifar10        |  64  | 0.3891  |  2.2968   |     3.1644     |       nan       | 16.9364  |        16.5698         |
|           hf_DistilBert           |  8   | 0.4481  |  3.5967   |     6.4314     |       nan       | 14.4217  |        13.7809         |
|             resnet18              |  16  | 0.3897  |  2.1039   |     2.8888     |     28.146      | 13.4634  |        14.0684         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.3597  |  2.5366   |     3.2709     |     25.9788     |  8.5797  |         8.3874         |
|           pytorch_unet            |  1   | 0.4317  |  2.3488   |     3.1857     |     25.5648     |  8.5192  |         8.2043         |
|          LearningToPaint          |  96  | 0.4187  |  2.2007   |     3.0445     |     35.9666     |  7.5207  |         7.3226         |
|           squeezenet1_1           |  32  | 0.2209  |  1.0642   |     1.5533     |     5.0214      |  4.2396  |         3.9154         |
|               vgg16               |  64  | 0.1892  |  0.7272   |     1.1591     |     3.9105      |  3.5836  |         3.4519         |
|      nvidia_deeprecommender       | 256  | 0.1923  |  0.4788   |     0.7278     |     5.9264      |  3.4807  |         3.216          |
|         soft_actor_critic         | 256  | 0.1989  |  0.3589   |     0.5134     |     2.4526      |  3.3991  |         2.8458         |
|              alexnet              | 128  | 0.1441  |  0.4498   |     0.7233     |     3.6339      |  3.0565  |         2.7715         |
|               dcgan               |  32  | 0.1657  |  0.4822   |     0.7116     |     4.2342      |  2.7374  |         2.5303         |
|           lennard_jones           | 1000 | 0.1377  |  0.3196   |     0.5378     |     2.1834      |  2.0265  |         1.8215         |
|            tts_angular            |  64  | 0.2077  |  0.2735   |     0.4572     |     1.0477      |  1.9005  |         1.7362         |
|              demucs               |  4   | 0.3077  |  0.3135   |     0.3232     |     0.3089      |  0.2151  |         0.2173         |
|             tacotron2             |  64  | 17.5224 |  31.345   |      nan       |       nan       |   nan    |        68.0676         |
|           hf_GPT2_large           |  4   | 4.9526  |  21.4987  |      nan       |       nan       |   nan    |        48.6542         |
|               hf_T5               |  8   |  2.243  |  11.5182  |      nan       |       nan       |   nan    |        29.3693         |
|               dlrm                | 2048 |   nan   |    nan    |      nan       |      4.487      |   nan    |         3.2098         |
|           hf_Longformer           |  2   |  6.012  |  16.301   |     79.761     |       nan       |   nan    |          nan           |
|           BERT_pytorch            |  0   |   nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          DALLE2_pytorch           |  0   |   nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|                drq                |  0   |   nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|               moco                |  0   |   nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|         timm_efficientdet         |  0   |   nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|         timm_efficientnet         |  0   |   nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            timm_regnet            |  0   |   nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            timm_vovnet            |  0   |   nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------------+-----------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------------+-----------------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------------+-----------------+----------+------------------------+
|      resnet50_quantized_qat       |  32  | 0.9971 |  0.9148   |      nan       |     0.8492      |  1.4304  |         1.4304         |
|    mobilenet_v2_quantized_qat     |  96  | 0.9961 |  0.8279   |      nan       |     0.8271      |  1.404   |         1.404          |
|            Super_SloMo            |  6   | 1.0023 |  0.9526   |     0.363      |     0.9527      |  1.1857  |         1.1913         |
|           mobilenet_v2            |  96  | 0.9923 |  0.7624   |     0.3061     |       nan       |  1.1003  |          1.11          |
|           squeezenet1_1           |  32  | 0.9781 |  0.8163   |     0.3371     |     0.8132      |  1.0821  |         1.1262         |
|        speech_transformer         |  32  | 0.9982 |  0.9159   |     0.2703     |       nan       |  1.0395  |         1.042          |
|            timm_nfnet             | 128  | 0.9358 |  0.8937   |      nan       |      0.879      |  1.0221  |         1.0495         |
|              demucs               |  4   | 0.9888 |  0.9884   |     0.9888     |     0.9884      |  0.9884  |         0.9884         |
|            tts_angular            |  64  | 0.9884 |  0.9884   |     0.9829     |     0.9884      |  0.983   |         0.9884         |
|        shufflenet_v2_x1_0         | 128  | 0.9739 |  0.8944   |      0.35      |      0.814      |  0.9789  |         1.0066         |
|              hf_GPT2              |  4   | 0.9548 |   0.906   |     0.3702     |     0.8845      |  0.9703  |         1.1094         |
|        Background_Matting         |  4   | 0.9989 |  0.9483   |     0.3594     |     0.9323      |  0.9204  |         0.9231         |
|              yolov3               |  16  | 0.9893 |  0.8384   |     0.3319     |     0.8043      |  0.9089  |         0.9128         |
|          pytorch_stargan          |  16  | 0.9975 |   1.009   |     0.4108     |       nan       |  0.9015  |         0.9845         |
|           timm_resnest            |  32  | 0.9926 |  0.8759   |     0.3223     |     0.7296      |  0.8947  |         0.964          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9976 |  0.9117   |     0.3921     |     0.8949      |  0.8928  |         0.9624         |
|             hf_Albert             |  8   | 0.9333 |  0.9333   |     0.2846     |       nan       |  0.8836  |         1.2215         |
|        mobilenet_v3_large         |  32  | 0.9876 |   0.856   |     0.3277     |     0.7754      |  0.8832  |         0.8974         |
|            densenet121            |  4   |  1.0   |  0.8879   |     0.3452     |     0.8612      |  0.8624  |         0.943          |
|   timm_vision_transformer_large   |  8   | 0.9997 |  0.8415   |      nan       |       nan       |  0.8621  |         1.031          |
|            hf_T5_large            |  2   | 0.922  |  0.8673   |      nan       |       nan       |  0.8613  |         0.922          |
|           pytorch_unet            |  1   | 0.9985 |  0.8521   |     0.3441     |     0.8388      |  0.859   |         0.8608         |
|             resnet50              |  32  | 0.9945 |  0.8704   |     0.3364     |     0.7952      |  0.8551  |         0.8906         |
|            mnasnet1_0             |  32  | 0.9878 |  0.8992   |     0.3334     |     0.8252      |  0.8532  |         0.8671         |
|              hf_Bart              |  4   | 0.9617 |  0.8777   |      nan       |       nan       |  0.8504  |         1.1284         |
|           fastNLP_Bert            |  6   | 1.0011 |  0.9152   |     0.3384     |       nan       |  0.8354  |         1.1229         |
|          resnext50_32x4d          |  8   | 0.9961 |  0.8679   |     0.3585     |     0.8188      |  0.8278  |         0.8346         |
|            hf_BigBird             |  2   | 0.9604 |  0.9604   |     0.4299     |       nan       |  0.8211  |         1.0392         |
|               dcgan               |  32  | 0.9754 |  0.7634   |     0.4581     |     0.7634      |  0.767   |         0.7903         |
|      timm_vision_transformer      |  8   | 0.9943 |  0.8874   |     0.3309     |       nan       |  0.7507  |         0.8213         |
|         soft_actor_critic         | 256  | 0.9997 |  0.9637   |     0.4355     |     0.9304      |   0.75   |         0.9991         |
|              alexnet              | 128  | 0.9542 |   0.745   |     0.4163     |     0.7449      |  0.743   |         0.8332         |
|              hf_Bert              |  4   | 0.9683 |  0.9011   |     0.3525     |       nan       |  0.7061  |         1.0016         |
|          LearningToPaint          |  96  | 0.9454 |  0.6943   |     0.3399     |      0.627      |  0.6945  |         0.7512         |
|             resnet18              |  16  | 0.9831 |  0.7792   |     0.3589     |     0.6949      |  0.6902  |         0.7049         |
|           hf_DistilBert           |  8   | 0.9211 |  0.9047   |     0.3213     |       nan       |  0.6595  |         0.9466         |
|               vgg16               |  64  | 0.9944 |  0.6638   |     0.3214     |     0.6638      |  0.6471  |         0.6497         |
|           lennard_jones           | 1000 | 0.9995 |  0.9995   |     0.3711     |     0.9995      |  0.5646  |         0.9989         |
|      nvidia_deeprecommender       | 256  | 0.5598 |  0.5598   |     0.4624     |     0.5598      |  0.5598  |         0.5598         |
| attention_is_all_you_need_pytorch | 256  | 0.9476 |  0.9243   |      nan       |       nan       |  0.4867  |         0.6508         |
|          pytorch_struct           | 200  |  1.0   |  0.5079   |     0.4824     |     0.5079      |  0.4222  |         0.429          |
|       functorch_dp_cifar10        |  64  | 0.9961 |  0.8224   |     0.4456     |       nan       |  0.4056  |         0.4212         |
|            hf_Reformer            |  4   | 0.3011 |    nan    |     0.2397     |       nan       |  0.299   |         0.9882         |
|             tacotron2             |  64  | 0.9906 |  1.0302   |      nan       |       nan       |   nan    |         1.1494         |
|               hf_T5               |  8   | 0.9527 |  0.9415   |      nan       |       nan       |   nan    |         1.1434         |
|           hf_GPT2_large           |  4   | 0.936  |  0.8833   |      nan       |       nan       |   nan    |         1.1258         |
|               dlrm                | 2048 |  nan   |    nan    |      nan       |     0.7307      |   nan    |         0.7306         |
|           hf_Longformer           |  2   | 0.9603 |  0.9603   |     0.2946     |       nan       |   nan    |          nan           |
|           BERT_pytorch            |  0   |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          DALLE2_pytorch           |  0   |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|                drq                |  0   |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|               moco                |  0   |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|         timm_efficientdet         |  0   |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|         timm_efficientnet         |  0   |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            timm_regnet            |  0   |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            timm_vovnet            |  0   |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------------+-----------------+----------+------------------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|            YituTechConvBert             |  1  | 1.0274 |  0.8994   |      0.0       |       0.0       |  3.1609  |         1.4393         |
|               DistillGPT2               |  1  | 1.0344 |  0.9148   |     1.0469     |     0.2987      |  2.4099  |         1.8225         |
|                CamemBert                |  1  | 1.0453 |  0.9298   |     1.3249     |       0.0       |  2.403   |         1.5084         |
|          MobileBertForMaskedLM          | 32  | 1.0238 |  0.9295   |      0.0       |       0.0       |  2.323   |         1.535          |
|       MT5ForConditionalGeneration       |  8  | 1.0219 |  0.8758   |      0.0       |       0.0       |  2.2525  |         1.9527         |
|               GoogleFnet                |  1  | 0.9926 |  0.8024   |     0.9835     |       0.0       |  1.9172  |         1.0977         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |  0.9771   |      0.0       |      0.656      |  1.7888  |         1.7767         |
|       ElectraForQuestionAnswering       | 64  | 1.0002 |  0.9853   |      0.0       |       0.0       |  1.4272  |         1.4082         |
|           ElectraForCausalLM            | 32  | 1.0003 |  0.9302   |      0.0       |       0.0       |  1.4136  |         1.4512         |
|     M2M100ForConditionalGeneration      |  8  | 1.0115 |   1.008   |     1.0862     |       0.0       |  1.4108  |         1.3048         |
|     MobileBertForQuestionAnswering      | 64  | 1.0253 |  0.9099   |      0.0       |       0.0       |  1.403   |         1.2783         |
|    LayoutLMForSequenceClassification    | 16  |  1.0   |  0.9892   |     0.7372     |       0.0       |  1.3017  |         1.2911         |
|       AlbertForQuestionAnswering        |  4  | 0.9997 |  1.0023   |      0.0       |       0.0       |  1.2572  |         1.2514         |
|            AlbertForMaskedLM            |  4  | 1.0008 |  1.0004   |      0.0       |       0.0       |  1.2531  |         1.251          |
|    MegatronBertForQuestionAnswering     | 16  | 1.038  |   1.011   |     0.7604     |       0.0       |  1.2229  |         1.1182         |
|           LayoutLMForMaskedLM           | 16  |  1.0   |  0.9707   |      0.0       |       0.0       |  1.2126  |         1.2151         |
|     PLBartForConditionalGeneration      | 16  | 1.0146 |  0.9687   |      0.0       |       0.0       |  1.2002  |         1.1992         |
|             OPTForCausalLM              | 32  | 1.0022 |  0.9316   |      0.0       |       0.0       |  1.1817  |         1.2021         |
|             XGLMForCausalLM             |  8  | 1.0114 |   0.939   |     0.7997     |       0.0       |  1.1778  |         1.1847         |
|                 T5Small                 |  1  | 1.0276 |   0.883   |      0.0       |       0.0       |  1.1776  |         1.1836         |
|           DebertaForMaskedLM            |  4  | 0.9252 |  0.7847   |     0.732      |       0.0       |  1.1774  |         1.1554         |
|     DistilBertForQuestionAnswering      | 64  | 0.9997 |  0.9839   |     0.7132     |       0.0       |  1.1708  |         1.1514         |
|           RobertaForCausalLM            | 64  |  1.0   |  0.9629   |     0.7451     |       0.0       |  1.1458  |         1.1498         |
|         MegatronBertForCausalLM         | 16  | 1.0334 |  1.0044   |     0.7435     |       0.0       |  1.1312  |         1.1213         |
|         Speech2Text2ForCausalLM         | 128 | 0.9978 |  0.9276   |     0.6566     |       0.0       |  1.1236  |         1.1489         |
|       RobertaForQuestionAnswering       | 128 | 1.0002 |  0.9922   |      0.0       |       0.0       |  1.1177  |         1.114          |
|        BertForQuestionAnswering         | 128 | 0.9996 |  0.9923   |      0.0       |       0.0       |  1.1139  |         1.1072         |
|             BartForCausalLM             |  4  | 1.0007 |   0.966   |      0.0       |       0.0       |  1.0993  |         1.1104         |
|      BartForConditionalGeneration       |  2  | 1.0004 |  0.9883   |      0.0       |       0.0       |  1.0985  |         1.0943         |
|      MBartForConditionalGeneration      | 16  | 1.0117 |  0.9775   |      0.0       |       0.0       |  1.0969  |         1.0807         |
|     PegasusForConditionalGeneration     | 16  | 1.0104 |  0.9797   |     0.7637     |       0.0       |  1.0882  |         1.0819         |
|                 BigBird                 |  1  | 0.9914 |   0.929   |     0.9985     |       0.0       |  1.0849  |         0.9995         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0007 |  0.9404   |      0.0       |       0.0       |  1.0647  |         1.0715         |
|             BertForMaskedLM             | 64  | 1.0003 |  0.9616   |     0.7302     |       0.0       |  1.0566  |         1.0618         |
|          DistilBertForMaskedLM          | 64  |  1.0   |  0.9505   |     0.7121     |       0.0       |  1.0529  |         1.0681         |
|       DebertaForQuestionAnswering       |  8  | 0.9966 |  0.9693   |     0.6845     |       0.0       |  1.0506  |         1.221          |
|            PLBartForCausalLM            | 32  | 1.0052 |  0.9344   |     0.7164     |       0.0       |  1.0243  |         1.0556         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.8151   |      0.0       |       0.0       |  1.0187  |         1.0185         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0011 |  0.9092   |     0.6814     |       0.0       |  1.0081  |         1.0439         |
|            TrOCRForCausalLM             | 32  | 1.0002 |   0.955   |      0.0       |       0.0       |  1.0043  |         1.0147         |
|            MBartForCausalLM             | 32  | 1.0013 |  0.9515   |      0.0       |       0.0       |  0.9993  |         1.0109         |
|           PegasusForCausalLM            | 32  | 0.9993 |   0.953   |     0.7302     |       0.0       |  0.9915  |         1.0055         |
|          AllenaiLongformerBase          |  1  | 0.9453 |  0.8481   |     0.7819     |       0.0       |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+

Accuracy

+-----------------------------------------+----+-------+-----------+----------------+-----------------+-------------+------------------------+
|                  name                   | bs | eager | aot_eager | aot_cudagraphs | nvprims_nvfuser |  inductor   | inductor_no_cudagraphs |
+-----------------------------------------+----+-------+-----------+----------------+-----------------+-------------+------------------------+
|             BertForMaskedLM             | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|            MBartForCausalLM             | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|         Speech2Text2ForCausalLM         | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|             XGLMForCausalLM             | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|            AlbertForMaskedLM            | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|       AlbertForQuestionAnswering        | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|             BartForCausalLM             | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|      BartForConditionalGeneration       | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|      GPT2ForSequenceClassification      | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|       MT5ForConditionalGeneration       | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|           RobertaForCausalLM            | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|          MobileBertForMaskedLM          | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|     MobileBertForQuestionAnswering      | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|             OPTForCausalLM              | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|       T5ForConditionalGeneration        | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|                 T5Small                 | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|            TrOCRForCausalLM             | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|            XLNetLMHeadModel             | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|            YituTechConvBert             | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   |    pass     |          pass          |
|        BertForQuestionAnswering         | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|       RobertaForQuestionAnswering       | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|     PegasusForConditionalGeneration     | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|           ElectraForCausalLM            | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|                 BigBird                 | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|       BlenderbotSmallForCausalLM        | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|                CamemBert                | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|           DebertaForMaskedLM            | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|       DebertaForQuestionAnswering       | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|          DistilBertForMaskedLM          | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|     DistilBertForQuestionAnswering      | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|           PegasusForCausalLM            | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|               DistillGPT2               | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|       ElectraForQuestionAnswering       | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|               GoogleFnet                | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|           LayoutLMForMaskedLM           | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|    LayoutLMForSequenceClassification    | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|     M2M100ForConditionalGeneration      | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|         MegatronBertForCausalLM         | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|    MegatronBertForQuestionAnswering     | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|            PLBartForCausalLM            | 1  | pass  |   pass    |      pass      |   fail_to_run   |    pass     |          pass          |
|          AllenaiLongformerBase          | 1  | pass  |   pass    |      pass      |   fail_to_run   | fail_to_run |      fail_to_run       |
|      MBartForConditionalGeneration      | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   | fail_to_run |      fail_to_run       |
|     PLBartForConditionalGeneration      | 1  | pass  |   pass    |  fail_to_run   |   fail_to_run   | fail_to_run |      fail_to_run       |
+-----------------------------------------+----+-------+-----------+----------------+-----------------+-------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|       DebertaForQuestionAnswering       |  8  | 4.6733 |  12.4129  |    46.2723     |       nan       | 106.2761 |        36.9712         |
|           DebertaForMaskedLM            |  4  | 4.6037 |  12.0142  |    45.6034     |       nan       | 102.6161 |         35.206         |
|             XGLMForCausalLM             |  8  | 2.285  |  13.963   |    43.1733     |       nan       | 94.4449  |        91.7467         |
|     M2M100ForConditionalGeneration      |  8  | 2.8676 |  15.3441  |    23.8784     |       nan       | 75.1194  |         72.388         |
|          MobileBertForMaskedLM          | 32  | 8.0322 |  31.4395  |      nan       |       nan       | 59.6722  |        57.9264         |
|     MobileBertForQuestionAnswering      | 64  | 8.0065 |  31.3134  |      nan       |       nan       | 58.8377  |        56.0146         |
|            YituTechConvBert             |  1  | 2.1143 |  11.2822  |      nan       |       nan       | 54.4228  |        51.3304         |
|     PegasusForConditionalGeneration     | 16  | 2.6332 |  16.745   |    26.3799     |       nan       | 48.9728  |         45.368         |
|      BartForConditionalGeneration       |  2  | 2.8283 |  17.3733  |      nan       |       nan       | 48.6188  |        48.0802         |
|      MBartForConditionalGeneration      | 16  | 2.8415 |  17.2493  |      nan       |       nan       | 48.5545  |        45.9787         |
|                 BigBird                 |  1  | 7.3519 |  14.4816  |    30.0384     |       nan       | 41.5339  |        26.1666         |
|       MT5ForConditionalGeneration       |  8  | 3.5812 |  15.6366  |      nan       |       nan       | 40.4273  |        38.4155         |
|         MegatronBertForCausalLM         | 16  | 2.9745 |  14.6119  |    20.8656     |       nan       | 37.6765  |        36.3074         |
|    MegatronBertForQuestionAnswering     | 16  | 2.9706 |  14.6623  |    21.4667     |       nan       |  37.449  |        35.5244         |
|                 T5Small                 |  1  | 2.2716 |  10.8048  |      nan       |       nan       | 37.3498  |        36.6486         |
|    LayoutLMForSequenceClassification    | 16  | 1.6799 |  7.7567   |    11.2872     |       nan       | 35.2088  |        34.3709         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.7584 |  11.3554  |      nan       |       nan       | 34.4824  |         32.566         |
|       T5ForConditionalGeneration        |  4  | 2.3765 |  10.8948  |      nan       |       nan       | 31.9484  |        31.3496         |
|     PLBartForConditionalGeneration      | 16  | 1.3762 |  9.1425   |      nan       |       nan       | 30.8008  |        29.9136         |
|           ElectraForCausalLM            | 32  | 1.3496 |  7.1978   |      nan       |       nan       | 29.8169  |        27.2971         |
|           PegasusForCausalLM            | 32  | 1.0155 |  6.4968   |    10.2756     |       nan       | 24.1657  |        22.4839         |
|            MBartForCausalLM             | 32  | 0.9605 |  6.8031   |      nan       |       nan       | 23.2513  |        22.9468         |
|           LayoutLMForMaskedLM           | 16  | 1.6594 |  7.7534   |      nan       |       nan       | 23.2343  |        22.3658         |
|             BertForMaskedLM             | 64  | 1.3389 |  7.1091   |    10.1501     |       nan       | 22.9853  |        22.4101         |
|       ElectraForQuestionAnswering       | 64  | 1.3342 |  7.1468   |      nan       |       nan       | 22.5211  |        21.6752         |
|            TrOCRForCausalLM             | 32  | 1.0458 |  6.6678   |      nan       |       nan       | 22.3064  |        21.5718         |
|               GoogleFnet                |  1  | 0.7909 |  3.8161   |    10.7298     |       nan       | 22.1676  |        14.5891         |
|             BartForCausalLM             |  4  | 1.0266 |  6.5242   |      nan       |       nan       | 22.1044  |        20.6279         |
|        BertForQuestionAnswering         | 128 | 1.3416 |  7.1515   |      nan       |       nan       | 21.4984  |        20.6009         |
|           RobertaForCausalLM            | 64  | 1.3662 |  7.1342   |    10.4869     |       nan       | 21.2916  |        20.4156         |
|       RobertaForQuestionAnswering       | 128 | 1.372  |  7.2267   |      nan       |       nan       | 20.3739  |        19.4477         |
|                CamemBert                |  1  | 1.4018 |   7.183   |     9.7338     |       nan       | 20.2566  |        20.0063         |
|             OPTForCausalLM              | 32  | 1.0485 |  6.6749   |      nan       |       nan       | 20.0646  |        19.3042         |
|      GPT2ForSequenceClassification      |  4  | 1.359  |  7.2231   |      nan       |     72.6465     | 18.8184  |        18.6945         |
|            AlbertForMaskedLM            |  4  | 1.0888 |  6.8413   |      nan       |       nan       |  18.791  |        17.2833         |
|       AlbertForQuestionAnswering        |  4  | 0.9827 |  6.6095   |      nan       |       nan       | 17.8325  |        16.7729         |
|       BlenderbotSmallForCausalLM        | 64  | 0.6506 |  4.3553   |     6.4625     |       nan       | 17.7808  |        16.9874         |
|               DistillGPT2               |  1  | 0.653  |  3.5396   |     4.6937     |     44.0522     | 16.6466  |        16.7649         |
|         Speech2Text2ForCausalLM         | 128 | 0.593  |  3.4546   |     5.5215     |       nan       |  16.183  |        15.0192         |
|            PLBartForCausalLM            | 32  | 0.4975 |  3.5231   |     4.8355     |       nan       |  15.43   |         14.999         |
|          DistilBertForMaskedLM          | 64  | 0.448  |  3.5612   |     6.431      |       nan       | 13.5483  |        12.8553         |
|     DistilBertForQuestionAnswering      | 64  | 0.4582 |  3.5185   |     6.4368     |       nan       | 12.7988  |         12.223         |
|          AllenaiLongformerBase          |  1  | 6.0731 |  15.7322  |    80.1044     |       nan       |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|      GPT2ForSequenceClassification      |  4  | 0.9343 |  0.9093   |      nan       |     0.8817      |  1.0595  |         1.1224         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.9425   |      nan       |       nan       |  0.8646  |         1.4039         |
|     PegasusForConditionalGeneration     | 16  | 0.9985 |  0.9629   |     0.3704     |       nan       |  0.8436  |         1.0204         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.9255   |      nan       |       nan       |  0.842   |         1.3737         |
|                 BigBird                 |  1  | 0.999  |  0.9542   |     0.4213     |       nan       |  0.8224  |         1.0095         |
|             XGLMForCausalLM             |  8  | 0.9848 |  0.9137   |     0.3971     |       nan       |  0.8157  |         0.9642         |
|               DistillGPT2               |  1  | 0.9984 |  0.8115   |     0.3773     |     0.7597      |  0.807   |         0.926          |
|                 T5Small                 |  1  |  1.0   |  0.8947   |      nan       |       nan       |  0.7934  |         1.0493         |
|           ElectraForCausalLM            | 32  | 0.9983 |   0.883   |      nan       |       nan       |  0.7929  |         0.9036         |
|            YituTechConvBert             |  1  | 0.9858 |  0.8581   |      nan       |       nan       |  0.7893  |         0.8727         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8935   |      nan       |       nan       |  0.7817  |         0.9515         |
|           PegasusForCausalLM            | 32  | 0.9593 |  0.9232   |     0.3909     |       nan       |  0.7774  |         0.931          |
|       T5ForConditionalGeneration        |  4  |  1.0   |  0.9597   |      nan       |       nan       |  0.7711  |         1.1049         |
|               GoogleFnet                |  1  | 0.9983 |  0.9453   |     0.3715     |       nan       |  0.7698  |         0.9373         |
|     M2M100ForConditionalGeneration      |  8  |  1.0   |  0.9809   |     0.3975     |       nan       |  0.7621  |         1.0093         |
|       MT5ForConditionalGeneration       |  8  | 1.0034 |  0.8862   |      nan       |       nan       |  0.7603  |         0.9397         |
|    MegatronBertForQuestionAnswering     | 16  |  1.0   |  0.8671   |     0.3483     |       nan       |  0.7528  |         0.9646         |
|                CamemBert                |  1  | 0.998  |  0.8252   |     0.3615     |       nan       |  0.7487  |         0.9184         |
|     PLBartForConditionalGeneration      | 16  |  1.0   |  0.8957   |      nan       |       nan       |  0.7397  |         0.9638         |
|            PLBartForCausalLM            | 32  | 0.9999 |   0.861   |     0.3948     |       nan       |  0.7381  |         0.9055         |
|      MBartForConditionalGeneration      | 16  |  1.0   |  0.8583   |      nan       |       nan       |  0.7209  |         0.9059         |
|    LayoutLMForSequenceClassification    | 16  |  1.0   |  0.9348   |     0.3324     |       nan       |  0.7189  |         1.0294         |
|         MegatronBertForCausalLM         | 16  | 0.9995 |  0.8826   |     0.352      |       nan       |  0.7161  |         0.9247         |
|             BartForCausalLM             |  4  |  1.0   |  0.9121   |      nan       |       nan       |  0.7149  |         0.9466         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8401   |     0.3879     |       nan       |  0.7147  |         0.8647         |
|       ElectraForQuestionAnswering       | 64  |  1.0   |  0.9524   |      nan       |       nan       |  0.7054  |         1.0298         |
|     DistilBertForQuestionAnswering      | 64  |  1.0   |  0.9373   |     0.3177     |       nan       |  0.6981  |         0.9303         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8975   |      nan       |       nan       |  0.6977  |         0.946          |
|           LayoutLMForMaskedLM           | 16  |  1.0   |  0.9409   |      nan       |       nan       |  0.695   |         0.9772         |
|            MBartForCausalLM             | 32  | 0.9999 |   0.89    |      nan       |       nan       |  0.6836  |         0.8978         |
|            TrOCRForCausalLM             | 32  | 0.9999 |  0.8898   |      nan       |       nan       |  0.6827  |         0.8876         |
|         Speech2Text2ForCausalLM         | 128 | 0.9552 |   0.842   |     0.3524     |       nan       |  0.6775  |         0.9179         |
|             OPTForCausalLM              | 32  | 0.9982 |  0.8655   |      nan       |       nan       |  0.6761  |         0.8847         |
|          DistilBertForMaskedLM          | 64  |  1.0   |  0.8899   |     0.3665     |       nan       |  0.6531  |         0.9124         |
|             BertForMaskedLM             | 64  |  1.0   |  0.9219   |     0.3646     |       nan       |  0.6385  |         0.8992         |
|           RobertaForCausalLM            | 64  | 0.9986 |  0.9206   |     0.3642     |       nan       |  0.6375  |         0.8974         |
|        BertForQuestionAnswering         | 128 |  1.0   |   0.968   |      nan       |       nan       |  0.6329  |         0.8939         |
|       RobertaForQuestionAnswering       | 128 |  1.0   |   0.968   |      nan       |       nan       |  0.6329  |         0.8939         |
|          MobileBertForMaskedLM          | 32  | 0.9998 |  0.8355   |      nan       |       nan       |  0.4998  |         0.6646         |
|     MobileBertForQuestionAnswering      | 64  |  1.0   |   0.984   |      nan       |       nan       |  0.4536  |         0.5968         |
|           DebertaForMaskedLM            |  4  | 0.9991 |  0.9843   |     0.3553     |       nan       |  0.3862  |         1.0347         |
|       DebertaForQuestionAnswering       |  8  | 0.9816 |   1.063   |     0.3072     |       nan       |  0.2902  |         1.1588         |
|          AllenaiLongformerBase          |  1  | 0.9981 |  0.9515   |     0.321      |       nan       |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|           dm_nfnet_f0           | 128 |  1.0   |  1.0002   |      0.0       |       0.0       |  1.4735  |         1.4264         |
|          convnext_base          | 64  | 0.9999 |  0.9991   |      0.0       |       0.0       |  1.4727  |         1.4639         |
|            hrnet_w18            | 128 |  1.0   |  0.9998   |      0.0       |       0.0       |  1.4184  |         1.3803         |
|             dla102              | 128 | 0.9998 |  1.0007   |      0.0       |       0.0       |  1.3851  |          1.37          |
|           volo_d1_224           | 64  | 0.9999 |  0.9961   |      0.0       |       0.0       |  1.3833  |         1.3619         |
|            nfnet_l0             | 128 | 1.0001 |  0.7889   |      0.0       |     1.2287      |  1.3759  |         1.3274         |
|        res2net50_14w_8s         | 128 | 0.9998 |  1.0003   |      0.0       |     1.2439      |  1.3551  |         1.3256         |
|      xcit_large_24_p8_224       |  5  | 1.0007 |  0.9756   |      0.0       |       0.0       |  1.3514  |         1.3164         |
|        adv_inception_v3         | 128 |  1.0   |  0.9975   |      0.0       |     1.1287      |  1.3286  |         1.3079         |
|         crossvit_9_240          | 128 | 0.9996 |  0.9996   |      0.0       |       0.0       |  1.3285  |         1.3029         |
|          inception_v3           | 128 |  1.0   |   0.999   |      0.0       |     1.1286      |  1.3276  |         1.3078         |
|       gluon_inception_v3        | 128 | 0.9999 |   0.999   |      0.0       |     1.1285      |  1.3273  |         1.3071         |
|           resnest101e           | 64  |  1.0   |  1.0042   |      0.0       |       0.0       |  1.3144  |         1.2722         |
|           res2next50            | 128 | 0.9999 |  1.0011   |      0.0       |     1.1754      |  1.3129  |         1.2744         |
|          jx_nest_base           | 32  | 0.9998 |  0.9953   |      0.0       |       0.0       |  1.2785  |         1.2524         |
|         coat_lite_mini          | 128 | 0.9998 |  0.9858   |     0.8519     |       0.0       |  1.2715  |         1.2635         |
|           selecsls42b           | 128 |  1.0   |  1.0004   |     0.8156     |     1.2097      |  1.2682  |         1.2534         |
|          gmixer_24_224          | 128 | 0.9998 |  0.8101   |      0.0       |       0.0       |  1.2437  |         1.2275         |
|        res2net101_26w_4s        | 64  | 0.9999 |  1.0005   |     0.7725     |     1.1515      |  1.2264  |         1.1914         |
|           convit_base           | 64  | 0.9997 |  0.9989   |      0.0       |       0.0       |  1.211   |         1.2407         |
|        twins_pcpvt_base         | 64  | 0.9999 |  0.9992   |     0.7538     |       0.0       |  1.2033  |         1.1715         |
|          gmlp_s16_224           | 128 |  1.0   |  0.9501   |      0.0       |       0.0       |  1.201   |         1.1884         |
|            pit_b_224            | 64  | 0.9998 |   0.999   |      0.0       |       0.0       |  1.1874  |         1.1783         |
|          cait_m36_384           |  4  | 0.9999 |  1.0266   |      0.0       |       0.0       |  1.185   |         1.1592         |
|         poolformer_m36          | 64  | 0.9998 |  0.9987   |      0.0       |       0.0       |  1.1663  |         1.148          |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9779   |      0.0       |       0.0       |  1.1428  |         1.1282         |
|      beit_base_patch16_224      | 64  | 0.9999 |  0.9761   |      0.0       |       0.0       |  1.1119  |         1.103          |
|     swsl_resnext101_32x16d      | 32  |  1.0   |  1.0004   |      0.0       |       0.0       |  1.1069  |         1.0711         |
| deit_base_distilled_patch16_224 | 64  | 0.9999 |  0.9989   |     0.7667     |       0.0       |  1.0958  |         1.0833         |
|      vit_base_patch16_224       | 64  | 0.9999 |   0.999   |     0.7661     |       0.0       |  1.0856  |         1.0758         |
|        gluon_xception65         | 32  | 0.9998 |  0.9973   |      0.0       |       0.0       |  1.0854  |         1.0747         |
|        convmixer_768_32         | 32  | 0.9999 |    1.0    |      0.0       |       0.0       |  1.0784  |         1.0746         |
|          mixer_b16_224          | 128 | 0.9998 |  0.9785   |      0.0       |       0.0       |  1.0671  |         1.0623         |
|         visformer_small         | 128 | 1.0001 |  1.0031   |     0.8001     |     1.0483      |  1.0473  |         1.0133         |
|          resmlp_12_224          | 128 | 0.9998 |  0.8551   |     0.6124     |       0.0       |  0.7893  |         0.7996         |
|           mnasnet_100           |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|           tf_mixnet_l           |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|       tf_efficientnet_b0        |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|          spnasnet_100           |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|        sebotnet33ts_256         |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|           rexnet_100            |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|            repvgg_a2            |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|           regnety_002           |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|          pnasnet5large          |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|           mobilevit_s           |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|      mobilenetv3_large_100      |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|         mobilenetv2_100         |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|        tnt_s_patch16_224        | 128 | 0.9998 |  0.9995   |      0.0       |       0.0       |   0.0    |         1.5459         |
|            mixnet_l             |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|            lcnet_050            |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|          ghostnet_100           |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|            gernet_l             |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|            fbnetv3_b            |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|           fbnetc_100            |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|        ese_vovnet19b_dw         |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|        eca_halonext26ts         |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|       eca_botnext26ts_256       |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|             dpn107              |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|          cspdarknet53           |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|          botnet26t_256          |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
|            tinynet_a            |  0  |  0.0   |    0.0    |      0.0       |       0.0       |   0.0    |          0.0           |
+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------------+-----------------+---------------+------------------------+
|              name               | bs | eager |   aot_eager   | aot_cudagraphs | nvprims_nvfuser |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------------+-----------------+---------------+------------------------+
|        adv_inception_v3         | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|        convmixer_768_32         | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|          botnet26t_256          | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|             dpn107              | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|       eca_botnext26ts_256       | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|        eca_halonext26ts         | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|          ghostnet_100           | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|          mixer_b16_224          | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|           mobilevit_s           | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|            pit_b_224            | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|         poolformer_m36          | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|          resmlp_12_224          | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|        sebotnet33ts_256         | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|        twins_pcpvt_base         | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|      vit_base_patch16_224       | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |          pass          |
|      beit_base_patch16_224      | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|           convit_base           | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|          convnext_base          | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|         crossvit_9_240          | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|          gmixer_24_224          | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|          gmlp_s16_224           | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|          jx_nest_base           | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|        tnt_s_patch16_224        | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|           volo_d1_224           | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|      xcit_large_24_p8_224       | 2  | pass  |     pass      |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|          cait_m36_384           | 2  | pass  | fail_accuracy |  fail_to_run   |   fail_to_run   |     pass      |          pass          |
|         coat_lite_mini          | 2  | pass  | fail_accuracy | fail_accuracy  |   fail_to_run   |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 2  | pass  |     pass      |      pass      |   fail_to_run   |     pass      |     fail_accuracy      |
|           dm_nfnet_f0           | 2  | pass  |     pass      |  fail_to_run   |      pass       |     pass      |          pass          |
|         visformer_small         | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|            tinynet_a            | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|           tf_mixnet_l           | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|          cspdarknet53           | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|             dla102              | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|        ese_vovnet19b_dw         | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|           fbnetc_100            | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|            gernet_l             | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|       gluon_inception_v3        | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|        gluon_xception65         | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|            hrnet_w18            | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|          inception_v3           | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|            lcnet_050            | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|            mixnet_l             | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|           mnasnet_100           | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|         mobilenetv2_100         | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|      mobilenetv3_large_100      | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|            nfnet_l0             | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|          pnasnet5large          | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|           regnety_002           | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|            repvgg_a2            | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|        res2net101_26w_4s        | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|        res2net50_14w_8s         | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|           res2next50            | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|           rexnet_100            | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|           selecsls42b           | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|          spnasnet_100           | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|       tf_efficientnet_b0        | 2  | pass  |     pass      |      pass      |      pass       |     pass      |          pass          |
|           resnest101e           | 2  | pass  |     pass      |      pass      |   fail_to_run   | fail_accuracy |     fail_accuracy      |
|            fbnetv3_b            | 2  | pass  |     pass      |      pass      |  fail_accuracy  | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+---------------+----------------+-----------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|            hrnet_w18            | 128 | 5.7744 |  35.9932  |      nan       |       nan       | 109.042  |        104.8879        |
|  swin_base_patch4_window7_224   | 64  | 2.6122 |  14.6004  |      nan       |       nan       | 96.2962  |        93.4446         |
|      xcit_large_24_p8_224       |  5  | 2.6862 |  19.6497  |      nan       |       nan       | 88.3393  |        84.7322         |
|        twins_pcpvt_base         | 64  | 2.0976 |  14.8342  |    23.7204     |       nan       | 87.5503  |        85.5358         |
|          cait_m36_384           |  4  | 2.6883 |  20.7006  |      nan       |       nan       | 75.0498  |        72.2763         |
|          convnext_base          | 64  | 1.2612 |  6.8385   |      nan       |       nan       | 75.0447  |        74.6436         |
|          jx_nest_base           | 32  | 1.7318 |  10.3587  |      nan       |       nan       | 71.0908  |        68.0075         |
|           resnest101e           | 64  | 2.9392 |  18.7171  |      nan       |       nan       | 69.3124  |        66.6202         |
|         coat_lite_mini          | 128 | 1.0473 |  5.9613   |     8.6703     |       nan       | 65.1658  |        64.3286         |
|        res2net101_26w_4s        | 64  | 2.8458 |  19.537   |    30.4307     |     317.19      | 57.5426  |         53.879         |
|        res2net50_14w_8s         | 128 | 2.6163 |  17.5772  |      nan       |    315.6623     | 52.2167  |        50.2404         |
|         poolformer_m36          | 64  | 1.8245 |  10.9415  |      nan       |       nan       | 48.0512  |        45.9206         |
|          gmlp_s16_224           | 128 | 0.9645 |  7.3639   |      nan       |       nan       |  47.071  |        46.2485         |
|         crossvit_9_240          | 128 | 1.3515 |  9.1943   |      nan       |       nan       | 45.3486  |        43.4715         |
|           volo_d1_224           | 64  | 1.2141 |  8.7407   |      nan       |       nan       | 42.0829  |        39.3043         |
|        gluon_xception65         | 32  | 1.7561 |  12.4352  |      nan       |       nan       | 41.4439  |        38.8379         |
|          gmixer_24_224          | 128 | 1.0646 |  8.2843   |      nan       |       nan       | 36.0991  |        34.8305         |
|       gluon_inception_v3        | 128 | 1.5229 |  10.147   |      nan       |    130.7291     | 35.9322  |        33.2685         |
|        adv_inception_v3         | 128 | 1.5176 |   9.977   |      nan       |    135.0223     | 35.4115  |        33.3396         |
|          inception_v3           | 128 | 1.4614 |  10.3638  |      nan       |    137.5355     | 35.3531  |        33.0883         |
|     swsl_resnext101_32x16d      | 32  | 1.6615 |  10.9084  |      nan       |       nan       | 35.3218  |        33.7803         |
|             dla102              | 128 | 1.6942 |  11.2347  |      nan       |       nan       | 33.0332  |        32.0553         |
|           convit_base           | 64  | 1.0973 |  6.8263   |      nan       |       nan       |  32.42   |        30.5673         |
|           dm_nfnet_f0           | 128 | 2.0821 |  8.6473   |      nan       |       nan       |  31.377  |         29.698         |
|           res2next50            | 128 | 1.5542 |  9.7786   |      nan       |    158.5107     | 30.6184  |        28.4858         |
|        convmixer_768_32         | 32  | 1.2125 |  7.1641   |      nan       |       nan       | 26.4853  |         25.041         |
|          resmlp_12_224          | 128 | 0.627  |  3.2646   |     6.0164     |       nan       | 26.4765  |        24.6236         |
|         visformer_small         | 128 | 0.9074 |  5.0025   |     6.7307     |     58.1826     |  26.44   |        25.9071         |
|          mixer_b16_224          | 128 | 0.6647 |  3.6984   |      nan       |       nan       | 24.8814  |         23.765         |
|            nfnet_l0             | 128 | 1.7481 |  8.5147   |      nan       |    149.0261     | 23.1067  |         22.222         |
| deit_base_distilled_patch16_224 | 64  | 0.8721 |  5.0017   |     7.1931     |       nan       | 22.4918  |        21.5085         |
|      beit_base_patch16_224      | 64  | 1.112  |  6.1711   |      nan       |       nan       | 22.3875  |        21.0302         |
|      vit_base_patch16_224       | 64  | 0.8238 |  4.9387   |     7.1712     |       nan       | 22.1871  |        21.2592         |
|            pit_b_224            | 64  | 0.9564 |  6.0458   |      nan       |       nan       | 19.8434  |        18.8503         |
|           selecsls42b           | 128 | 0.7507 |  4.4933   |     6.4294     |     66.2942     | 16.8034  |        16.0009         |
|        tnt_s_patch16_224        | 128 | 1.5617 |  11.673   |      nan       |       nan       |   nan    |        36.4293         |
|          botnet26t_256          |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          cspdarknet53           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|             dpn107              |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|       eca_botnext26ts_256       |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|        eca_halonext26ts         |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|        ese_vovnet19b_dw         |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           fbnetc_100            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            fbnetv3_b            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            gernet_l             |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          ghostnet_100           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            lcnet_050            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            mixnet_l             |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           mnasnet_100           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|         mobilenetv2_100         |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|      mobilenetv3_large_100      |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           mobilevit_s           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          pnasnet5large          |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           regnety_002           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            repvgg_a2            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           rexnet_100            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|        sebotnet33ts_256         |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          spnasnet_100           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|       tf_efficientnet_b0        |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           tf_mixnet_l           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            tinynet_a            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | aot_cudagraphs | nvprims_nvfuser | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+
|          gmixer_24_224          | 128 | 0.9951 |  0.9185   |      nan       |       nan       |  1.5552  |         1.6267         |
|            nfnet_l0             | 128 | 0.993  |  0.8275   |      nan       |     0.8271      |  1.2906  |         1.3388         |
|          cait_m36_384           |  4  | 0.9994 |   0.934   |      nan       |       nan       |  1.1185  |         1.1746         |
|         poolformer_m36          | 64  | 0.9983 |  0.9509   |      nan       |       nan       |  1.0521  |         1.0698         |
|           dm_nfnet_f0           | 128 | 0.9357 |   0.894   |      nan       |       nan       |  1.0221  |         1.0495         |
|      beit_base_patch16_224      | 64  | 0.9966 |  0.9545   |      nan       |       nan       |  1.0038  |         1.0607         |
|           resnest101e           | 64  | 0.9971 |  0.9519   |      nan       |       nan       |  0.9993  |         1.0025         |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9434   |     0.3153     |       nan       |  0.997   |         1.0835         |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9442   |     0.3138     |       nan       |  0.9925  |         1.0805         |
|        twins_pcpvt_base         | 64  | 0.9976 |  0.9195   |     0.3131     |       nan       |  0.9924  |         1.0673         |
|           volo_d1_224           | 64  | 0.996  |  0.9213   |      nan       |       nan       |  0.9837  |         1.001          |
|        convmixer_768_32         | 32  | 0.9986 |  0.9854   |      nan       |       nan       |  0.9836  |         0.9853         |
|          mixer_b16_224          | 128 | 0.9952 |   0.94    |      nan       |       nan       |  0.9827  |         1.0538         |
|          gmlp_s16_224           | 128 | 0.9959 |  0.9487   |      nan       |       nan       |  0.9766  |         0.9827         |
|      xcit_large_24_p8_224       |  5  | 0.9981 |  0.8982   |      nan       |       nan       |  0.9633  |         1.0572         |
|             dla102              | 128 | 0.9828 |  0.9169   |      nan       |       nan       |  0.9489  |         0.9538         |
|            hrnet_w18            | 128 | 0.9955 |  0.9252   |      nan       |       nan       |  0.9378  |         0.9419         |
|          jx_nest_base           | 32  | 1.0002 |  0.8966   |      nan       |       nan       |  0.9348  |         1.0548         |
|        gluon_xception65         | 32  | 0.9975 |  0.9358   |      nan       |       nan       |  0.9343  |         0.9368         |
|        res2net101_26w_4s        | 64  | 0.9967 |  0.9278   |     0.3243     |     0.8769      |   0.93   |         0.9563         |
|          convnext_base          | 64  | 0.9975 |  0.9169   |      nan       |       nan       |  0.9126  |         0.9981         |
|           res2next50            | 128 | 0.9955 |  0.9149   |      nan       |     0.8461      |  0.9075  |         0.9311         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9288   |      nan       |       nan       |  0.9069  |         1.0464         |
|         visformer_small         | 128 | 0.9944 |  0.9374   |     0.3291     |     0.9283      |  0.9029  |         0.9502         |
|           selecsls42b           | 128 | 0.9885 |  0.8897   |     0.337      |     0.8775      |  0.8987  |         0.919          |
|       gluon_inception_v3        | 128 |  0.99  |  0.8616   |      nan       |     0.8238      |  0.8985  |         0.9073         |
|          inception_v3           | 128 |  0.99  |  0.8616   |      nan       |     0.8238      |  0.8985  |         0.9073         |
|        adv_inception_v3         | 128 |  0.99  |  0.8616   |      nan       |     0.8238      |  0.8985  |         0.9073         |
|     swsl_resnext101_32x16d      | 32  | 0.9992 |  0.8965   |      nan       |       nan       |  0.8913  |         0.923          |
|        res2net50_14w_8s         | 128 | 0.995  |  0.9047   |      nan       |     0.8422      |  0.8821  |         0.9326         |
|            pit_b_224            | 64  | 0.9968 |  0.7946   |      nan       |       nan       |  0.8563  |         1.0631         |
|         coat_lite_mini          | 128 | 1.0049 |  0.8526   |     0.3226     |       nan       |  0.8208  |         0.9438         |
|          resmlp_12_224          | 128 | 0.9893 |  0.6396   |     0.2199     |       nan       |  0.7899  |         0.7979         |
|           convit_base           | 64  | 0.9977 |  0.8838   |      nan       |       nan       |  0.7463  |         0.9008         |
|         crossvit_9_240          | 128 | 0.9884 |  0.8656   |      nan       |       nan       |  0.6584  |         0.8854         |
|        tnt_s_patch16_224        | 128 | 0.996  |  0.9769   |      nan       |       nan       |   nan    |         0.8622         |
|          botnet26t_256          |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          cspdarknet53           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|             dpn107              |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|       eca_botnext26ts_256       |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|        eca_halonext26ts         |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|        ese_vovnet19b_dw         |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           fbnetc_100            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            fbnetv3_b            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            gernet_l             |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          ghostnet_100           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            lcnet_050            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            mixnet_l             |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           mnasnet_100           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|         mobilenetv2_100         |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|      mobilenetv3_large_100      |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           mobilevit_s           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          pnasnet5large          |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           regnety_002           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            repvgg_a2            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           rexnet_100            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|        sebotnet33ts_256         |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|          spnasnet_100           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|       tf_efficientnet_b0        |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|           tf_mixnet_l           |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
|            tinynet_a            |  0  |  nan   |    nan    |      nan       |       nan       |   nan    |          nan           |
+---------------------------------+-----+--------+-----------+----------------+-----------------+----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_float32.png :

bench_logs/timm_models_float32.png :

bench_logs/torchbench_float32.png :

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 87%, 52/60 | 93%, 42/45  | 97%, 58/60  |
| inductor_no_cudagraphs | 87%, 52/60 | 98%, 44/45  | 98%, 59/60  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.56x    |    1.59x    |    1.40x    |
| inductor_no_cudagraphs |   1.28x    |    1.48x    |    1.38x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.85    |    7.76     |    5.89     |
|       aot_eager        |    9.58    |    16.39    |    13.13    |
|        inductor        |   61.46    |    59.90    |   105.98    |
| inductor_no_cudagraphs |   59.82    |    54.96    |   105.68    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.79x    |    0.89x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.03x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Previous report name: /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803

Passrate diff

+------------------------+-------------+------------+------------+
|        compiler        |    suite    | prev_value | cur_value  |
+------------------------+-------------+------------+------------+
|        inductor        | torchbench  | 87%, 52/60 | 87%, 52/60 |
|        inductor        | huggingface | 93%, 42/45 | 93%, 42/45 |
|        inductor        | timm_models | 93%, 56/60 | 97%, 58/60 |
| inductor_no_cudagraphs | torchbench  | 85%, 51/60 | 87%, 52/60 |
| inductor_no_cudagraphs | huggingface | 98%, 44/45 | 98%, 44/45 |
| inductor_no_cudagraphs | timm_models | 95%, 57/60 | 98%, 59/60 |
+------------------------+-------------+------------+------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.58x    |   1.56x   |
|        inductor        | huggingface |   1.58x    |   1.59x   |
|        inductor        | timm_models |   1.40x    |   1.40x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.28x   |
| inductor_no_cudagraphs | huggingface |   1.49x    |   1.48x   |
| inductor_no_cudagraphs | timm_models |   1.38x    |   1.38x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+---------------------------------+------------------------+-----------------+
|    suite    |              name               | inductor_no_cudagraphs |    inductor     |
+-------------+---------------------------------+------------------------+-----------------+
| torchbench  |              moco               |      fail_to_run       |   fail_to_run   |
| torchbench  |       Background_Matting        |    eager_variation     | eager_variation |
| torchbench  |         vision_maskrcnn         |    eager_variation     | eager_variation |
| torchbench  |            tacotron2            |         0.0000         |     0.0000      |
| torchbench  |               gat               |         0.0000         |     0.0000      |
| torchbench  |               gcn               |         0.0000         |     0.0000      |
| torchbench  |              llama              |         0.0000         |     0.0000      |
| torchbench  |              sage               |         0.0000         |     0.0000      |
| torchbench  |          torchrec_dlrm          |         0.0000         |     0.0000      |
| huggingface |  DebertaV2ForQuestionAnswering  |          pass          |   fail_to_run   |
| huggingface |   AlbertForQuestionAnswering    |     fail_accuracy      |  fail_accuracy  |
| timm_models | deit_base_distilled_patch16_224 |      fail_to_run       |   fail_to_run   |
| timm_models |        sebotnet33ts_256         |          pass          |  fail_accuracy  |
+-------------+---------------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+---------------------------------+------------------------+----------+
|    suite    |              name               | inductor_no_cudagraphs | inductor |
+-------------+---------------------------------+------------------------+----------+
| torchbench  |            resnet18             |         0.9481         |  1.5953  |
| torchbench  |              dcgan              |         0.8218         |  1.4331  |
| torchbench  |          lennard_jones          |         0.8805         |  1.3878  |
| torchbench  |        soft_actor_critic        |         0.8184         |  1.1004  |
| torchbench  |           timm_vovnet           |         0.9214         |  0.9337  |
| torchbench  |     nvidia_deeprecommender      |         1.0189         |  0.8729  |
| torchbench  |  timm_vision_transformer_large  |          0.0           |   0.0    |
| torchbench  |              moco               |          0.0           |   0.0    |
| torchbench  |               gat               |          0.0           |   0.0    |
| torchbench  |               gcn               |          0.0           |   0.0    |
| torchbench  |              sage               |          0.0           |   0.0    |
| torchbench  |            tacotron2            |          0.0           |   0.0    |
| torchbench  |          torchrec_dlrm          |          0.0           |   0.0    |
| huggingface |   DebertaForQuestionAnswering   |         0.9128         |  1.0047  |
| huggingface |       DebertaForMaskedLM        |         0.7781         |  0.9421  |
| huggingface |      DebertaV2ForMaskedLM       |         0.6205         |  0.8378  |
| huggingface |  DebertaV2ForQuestionAnswering  |         0.6324         |  0.7937  |
| huggingface |      BlenderbotForCausalLM      |         1.0907         |   0.0    |
| timm_models | deit_base_distilled_patch16_224 |          0.0           |  1.2524  |
| timm_models |          pnasnet5large          |         0.9191         |  0.9095  |
+-------------+---------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |        phlippe_densenet        |        164.2533        | 158.6339 |
| torchbench  |          hf_T5_large           |        156.4679        |  157.8   |
| torchbench  |       timm_efficientnet        |        138.1326        | 144.2262 |
| torchbench  |         hf_Longformer          |        111.1327        | 143.6787 |
| torchbench  |       mobilenet_v3_large       |        133.6657        | 135.8322 |
| torchbench  |           hf_BigBird           |        116.1809        | 135.0138 |
| torchbench  |          densenet121           |        129.5104        | 129.1855 |
| torchbench  |          mobilenet_v2          |        125.7336        | 127.9966 |
| huggingface |     AllenaiLongformerBase      |        111.0442        | 142.2008 |
| huggingface |     MobileBertForMaskedLM      |        132.5278        | 134.854  |
| huggingface | MobileBertForQuestionAnswering |        126.5694        | 129.2315 |
| huggingface |      DebertaV2ForMaskedLM      |        61.8771         | 127.3365 |
| huggingface |  MT5ForConditionalGeneration   |        125.6958        | 126.9394 |
| huggingface | DebertaV2ForQuestionAnswering  |        61.0171         | 125.1719 |
| timm_models |           rexnet_100           |        290.4619        | 280.4575 |
| timm_models |          ghostnet_100          |        237.6595        | 231.6665 |
| timm_models |           hrnet_w18            |        229.0573        | 229.157  |
| timm_models |           fbnetv3_b            |        161.2022        | 164.734  |
| timm_models |     mobilenetv3_large_100      |        154.5706        | 160.5862 |
| timm_models |          mobilevit_s           |        157.2508        | 157.1846 |
| timm_models |          resnest101e           |        158.2069        | 155.6736 |
| timm_models |       gluon_inception_v3       |        150.6782        | 154.0854 |
| timm_models |            mixnet_l            |        151.3775        | 153.9611 |
| timm_models |           tinynet_a            |        157.1143        | 153.7764 |
| timm_models |          tf_mixnet_l           |        155.4134        | 152.9577 |
| timm_models |          inception_v3          |        152.9301        | 152.9547 |
| timm_models |        adv_inception_v3        |        152.7775        | 152.8151 |
| timm_models |         pnasnet5large          |        150.321         | 151.9905 |
| timm_models |       tf_efficientnet_b0       |        152.681         | 150.4692 |
| timm_models |       res2net101_26w_4s        |        140.684         | 141.6157 |
| timm_models |        twins_pcpvt_base        |        140.1117        | 139.696  |
| timm_models |          spnasnet_100          |        137.8384        |  136.55  |
| timm_models |           fbnetc_100           |        132.0306        | 135.3267 |
| timm_models |        mobilenetv2_100         |        125.2882        | 124.479  |
| timm_models |      xcit_large_24_p8_224      |        123.4044        | 121.4632 |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |              hf_GPT2_large              |         1.1284         |  0.8906  |
| torchbench  |                 yolov3                  |         1.036          |  0.8712  |
| torchbench  |           speech_transformer            |         0.869          |  0.8651  |
| torchbench  |              timm_resnest               |         0.9519         |  0.8628  |
| torchbench  |           shufflenet_v2_x1_0            |         0.9647         |  0.8618  |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8593  |
| torchbench  |                resnet152                |         0.9402         |  0.8504  |
| torchbench  |               timm_regnet               |         0.9504         |  0.8489  |
| torchbench  |           Background_Matting            |         1.0412         |  0.8484  |
| torchbench  |              hf_DistilBert              |         0.9479         |  0.8476  |
| torchbench  |               hf_T5_large               |         1.168          |  0.8201  |
| torchbench  |              pytorch_unet               |         0.9308         |  0.8134  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8058  |
| torchbench  |           mobilenet_v3_large            |         0.8717         |  0.7848  |
| torchbench  |                  dcgan                  |         0.9645         |  0.7821  |
| torchbench  |                resnet50                 |         0.885          |  0.7821  |
| torchbench  |              squeezenet1_1              |         0.9087         |  0.773   |
| torchbench  |                 demucs                  |         0.9662         |  0.773   |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.7715  |
| torchbench  |                 hf_Bart                 |         0.9285         |  0.7535  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.7529  |
| torchbench  |               mnasnet1_0                |         0.8047         |  0.7429  |
| torchbench  |             pytorch_struct              |         0.7358         |  0.7274  |
| torchbench  |                  vgg16                  |         0.9805         |  0.7227  |
| torchbench  |                 alexnet                 |         0.9385         |  0.7088  |
| torchbench  |               densenet121               |         0.803          |  0.7085  |
| torchbench  |               hf_BigBird                |         1.1013         |  0.6971  |
| torchbench  |             resnext50_32x4d             |         0.7713         |  0.6655  |
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.6585  |
| torchbench  |                   drq                   |         0.9573         |  0.6379  |
| torchbench  |            soft_actor_critic            |         0.9973         |  0.6066  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.5925  |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6004         |  0.5904  |
| torchbench  |                resnet18                 |         0.6127         |  0.5423  |
| torchbench  |              lennard_jones              |         0.9997         |  0.5317  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.4538  |
| torchbench  |              hf_Longformer              |         0.8947         |  0.417   |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.3991  |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3169  |
| huggingface |            PLBartForCausalLM            |         0.9249         |  0.8907  |
| huggingface |     PegasusForConditionalGeneration     |         1.0074         |  0.8901  |
| huggingface |           ElectraForCausalLM            |         0.8941         |  0.889   |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8849  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8729  |
| huggingface |      MBartForConditionalGeneration      |         1.0307         |  0.8672  |
| huggingface |            TrOCRForCausalLM             |         0.9075         |  0.8619  |
| huggingface |            MBartForCausalLM             |         0.9507         |  0.8491  |
| huggingface |      BartForConditionalGeneration       |         1.0139         |  0.8456  |
| huggingface |         MegatronBertForCausalLM         |         1.0962         |  0.845   |
| huggingface |             BartForCausalLM             |         0.943          |  0.8301  |
| huggingface |       BlenderbotSmallForCausalLM        |         0.8318         |  0.8065  |
| huggingface |           PegasusForCausalLM            |         0.9252         |  0.7952  |
| huggingface |         Speech2Text2ForCausalLM         |         0.808          |  0.7566  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.7473  |
| huggingface |             XGLMForCausalLM             |         0.9287         |  0.6744  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6569  |
| huggingface |     M2M100ForConditionalGeneration      |         0.8978         |  0.6058  |
| huggingface |           DebertaForMaskedLM            |         0.9978         |  0.5501  |
| huggingface |          DebertaV2ForMaskedLM           |         0.9665         |  0.5197  |
| huggingface |      DebertaV2ForQuestionAnswering      |         0.9801         |  0.487   |
| huggingface |          AllenaiLongformerBase          |         0.8742         |  0.4688  |
| huggingface |       DebertaForQuestionAnswering       |         1.1527         |  0.4601  |
| timm_models |                hrnet_w18                |          0.99          |  0.8918  |
| timm_models |            sebotnet33ts_256             |         1.1115         |  0.891   |
| timm_models |           gluon_inception_v3            |         1.0171         |  0.8904  |
| timm_models |              inception_v3               |         1.0171         |  0.8904  |
| timm_models |            adv_inception_v3             |         1.0171         |  0.8904  |
| timm_models |                 dpn107                  |         0.9642         |  0.8833  |
| timm_models |            gluon_xception65             |         0.9705         |  0.8831  |
| timm_models |              ghostnet_100               |         0.977          |  0.8807  |
| timm_models |              spnasnet_100               |         0.9451         |  0.8786  |
| timm_models |          mobilenetv3_large_100          |         0.9361         |  0.877   |
| timm_models |             poolformer_m36              |         1.1871         |  0.8768  |
| timm_models |           eca_botnext26ts_256           |         1.0072         |  0.8738  |
| timm_models |          xcit_large_24_p8_224           |         0.9732         |  0.8721  |
| timm_models |            res2net50_14w_8s             |         0.9607         |  0.8712  |
| timm_models |            res2net101_26w_4s            |         0.9483         |  0.871   |
| timm_models |                mixnet_l                 |         0.9902         |  0.8687  |
| timm_models |               mnasnet_100               |         0.9403         |  0.8683  |
| timm_models |               res2next50                |         0.9547         |  0.866   |
| timm_models |              cait_m36_384               |         0.989          |  0.8632  |
| timm_models |               fbnetc_100                |         0.9535         |  0.8596  |
| timm_models |                pit_b_224                |         1.0242         |  0.8578  |
| timm_models |               selecsls42b               |         0.9664         |  0.8576  |
| timm_models |              convnext_base              |         1.0338         |  0.8505  |
| timm_models |                gernet_l                 |         0.9706         |  0.8499  |
| timm_models |         swsl_resnext101_32x16d          |         0.9786         |  0.8461  |
| timm_models |             coat_lite_mini              |         1.0202         |  0.8402  |
| timm_models |              botnet26t_256              |         0.9779         |  0.8239  |
| timm_models |                lcnet_050                |         0.884          |  0.805   |
| timm_models |                repvgg_a2                |         0.9611         |  0.7738  |
| timm_models |               regnety_002               |         0.8966         |  0.7602  |
| timm_models |             crossvit_9_240              |         0.9898         |  0.7526  |
| timm_models |      swin_base_patch4_window7_224       |         0.9045         |  0.7214  |
| timm_models |              jx_nest_base               |         0.9604         |  0.6693  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/comp_time_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/passrate_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803

No regressions found.

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803

Accuracy regressions

+----------+------------------+-------------+---------------+
| compiler |       name       | prev_status |  cur_status   |
+----------+------------------+-------------+---------------+
| inductor | sebotnet33ts_256 |    pass     | fail_accuracy |
+----------+------------------+-------------+---------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9658 |  0.9102   |  3.5344  |         1.3613         |
|           BERT_pytorch            |  16  | 0.9918 |  0.8039   |  3.1447  |         2.106          |
|            densenet121            |  4   | 0.9891 |   0.715   |  2.7511  |         1.0636         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9703 |  0.9196   |  2.551   |         1.7654         |
|            hf_BigBird             |  2   | 0.954  |  0.7794   |  2.5466  |         1.7422         |
|             hf_Albert             |  8   | 0.9925 |  0.9551   |  2.2878  |         2.2443         |
|            hf_T5_large            |  2   | 0.9765 |  0.8125   |  2.1745  |         1.8636         |
|         phlippe_densenet          | 128  | 0.983  |  0.7722   |  2.0695  |         0.9993         |
|        mobilenet_v3_large         |  32  | 0.9956 |  0.7833   |  2.0672  |         1.1803         |
|           squeezenet1_1           |  32  | 0.9811 |  0.9352   |  1.9917  |         1.309          |
|               dlrm                | 1024 | 0.9419 |  0.8507   |  1.9684  |         1.2032         |
|               hf_T5               |  8   | 0.9845 |  0.8491   |  1.8963  |         1.941          |
|          phlippe_resnet           | 128  | 0.9842 |  0.7629   |  1.8264  |         1.0076         |
|              hf_Bert              |  4   | 0.9952 |  0.8366   |  1.788   |         1.5751         |
|              hf_GPT2              |  4   | 0.9944 |  0.9567   |  1.7535  |         1.7818         |
|          resnext50_32x4d          |  8   | 0.9886 |  0.7118   |  1.7273  |         0.9741         |
|            mnasnet1_0             |  32  | 0.9888 |  0.7357   |  1.707   |         1.071          |
|              hf_Bart              |  4   | 0.9711 |  0.7743   |  1.669   |         1.4333         |
|           hf_GPT2_large           |  4   | 0.9829 |  0.9716   |  1.6561  |         1.7172         |
|        shufflenet_v2_x1_0         | 128  | 0.9949 |  0.7511   |  1.6183  |         1.1899         |
|        speech_transformer         |  32  | 0.9798 |  0.8268   |  1.6009  |         1.5785         |
|             resnet18              |  16  | 0.9834 |   0.755   |  1.5953  |         0.9481         |
|           hf_Bert_large           |  4   | 0.9985 |  0.8541   |  1.5708  |         1.5462         |
|      timm_vision_transformer      |  32  | 0.984  |  0.8532   |  1.561   |         1.4101         |
|           timm_resnest            |  32  | 0.9919 |  0.8493   |  1.5598  |         1.5096         |
|           fastNLP_Bert            |  6   | 0.9879 |  0.8416   |  1.5447  |         1.4924         |
|          pytorch_struct           | 200  | 0.9215 |  0.7796   |  1.5158  |         1.1176         |
|            timm_nfnet             | 128  | 0.9856 |  0.9846   |  1.5111  |          1.45          |
|           mobilenet_v2            |  96  | 0.9965 |  0.7777   |  1.5087  |         1.4847         |
|                drq                |  1   | 0.9589 |  0.7404   |  1.4914  |         1.0504         |
| attention_is_all_you_need_pytorch | 256  | 0.9895 |  0.8915   |  1.4468  |         1.5168         |
|               dcgan               |  32  | 0.8607 |  0.7035   |  1.4331  |         0.8218         |
|         timm_efficientnet         |  32  | 0.9383 |  0.6244   |  1.4297  |         1.0614         |
|           hf_DistilBert           |  8   | 0.9796 |  0.9554   |  1.4281  |         1.4596         |
|           hf_Longformer           |  2   | 0.8293 |  0.5634   |  1.4235  |         1.1744         |
|           lennard_jones           | 1000 | 0.8433 |  0.7781   |  1.3878  |         0.8805         |
|           pytorch_unet            |  1   | 0.997  |  0.2051   |  1.3747  |         1.3682         |
|          LearningToPaint          |  96  | 0.9906 |  0.7686   |  1.3102  |         1.0602         |
|          pytorch_stargan          |  16  | 0.9952 |  0.7877   |  1.272   |         1.3084         |
|               vgg16               |  64  | 0.9996 |  0.9984   |  1.2394  |         1.2526         |
|            Super_SloMo            |  6   | 0.9966 |  0.1791   |  1.2319  |         1.2332         |
|        Background_Matting         |  4   | 0.9992 |  0.1368   |  1.2125  |         1.2083         |
|             resnet152             |  32  | 0.9945 |  0.7604   |  1.2015  |         1.0265         |
|              yolov3               |  16  | 0.9967 |  0.8067   |  1.188   |          1.19          |
|             resnet50              |  32  | 0.9952 |  0.7735   |  1.1851  |         1.0525         |
|            hf_Reformer            |  4   | 0.9863 |  0.9582   |  1.1394  |         1.0595         |
|         soft_actor_critic         | 256  | 0.8747 |  0.6437   |  1.1004  |         0.8184         |
|              alexnet              | 128  | 0.999  |  0.9984   |  1.0891  |         1.1354         |
|              demucs               |  4   | 0.9996 |   1.002   |  1.0364  |         1.0354         |
|            timm_regnet            |  32  | 0.9533 |  0.7699   |  0.9899  |         0.972          |
|            tts_angular            |  64  | 0.9253 |  0.8878   |  0.955   |         0.9538         |
|            timm_vovnet            |  32  | 0.8574 |  0.7073   |  0.9337  |         0.9214         |
|      nvidia_deeprecommender       | 256  | 0.9987 |  0.9983   |  0.8729  |         1.0189         |
|   timm_vision_transformer_large   |  32  | 0.9994 |    0.0    |   0.0    |          0.0           |
|               moco                |  32  | 0.9811 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|         phlippe_densenet          | 128  | 3.2584  |  7.1667   | 158.6339 |        164.2533        |
|            hf_T5_large            |  2   | 26.9871 |  56.1954  |  157.8   |        156.4679        |
|         timm_efficientnet         |  32  | 5.0794  |  10.4934  | 144.2262 |        138.1326        |
|           hf_Longformer           |  2   | 11.4578 |  32.3059  | 143.6787 |        111.1327        |
|        mobilenet_v3_large         |  32  | 3.3891  |   7.79    | 135.8322 |        133.6657        |
|            hf_BigBird             |  2   | 13.0765 |  37.9874  | 135.0138 |        116.1809        |
|            densenet121            |  4   | 7.6821  |  18.2378  | 129.1855 |        129.5104        |
|           mobilenet_v2            |  96  | 3.1538  |  7.1365   | 127.9966 |        125.7336        |
|              yolov3               |  16  | 5.0326  |  10.9433  | 116.7021 |        113.2752        |
|            mnasnet1_0             |  32  |  3.152  |  6.8822   | 107.5114 |        106.2177        |
|             resnet152             |  32  | 9.1888  |  20.8647  | 99.5567  |        100.1765        |
|           timm_resnest            |  32  | 1.8499  |  3.9861   | 98.6392  |        100.9738        |
|           hf_GPT2_large           |  4   | 15.1062 |  30.5269  |  96.905  |        96.6366         |
|        shufflenet_v2_x1_0         | 128  | 3.5035  |  7.9025   | 79.8355  |        79.1559         |
|        speech_transformer         |  32  |  6.049  |  13.9952  | 74.1789  |         71.871         |
| attention_is_all_you_need_pytorch | 256  | 4.4272  |  11.3274  | 70.9551  |        70.7677         |
|            timm_regnet            |  32  |  7.134  |  12.5748  | 68.7124  |        66.6778         |
|            timm_nfnet             | 128  |  6.005  |  11.3988  | 68.5677  |        67.1074         |
|        Background_Matting         |  4   | 3.0203  |  11.4864  | 67.2169  |        62.2662         |
|           BERT_pytorch            |  16  | 4.9559  |  11.8756  | 65.7277  |        64.4379         |
|             resnet50              |  32  | 3.1574  |  7.1205   | 63.6233  |        60.4126         |
|            timm_vovnet            |  32  |  3.73   |  6.4912   | 59.6252  |        59.8555         |
|           pytorch_unet            |  1   | 1.5423  |  4.4889   | 57.3617  |         56.549         |
|           hf_Bert_large           |  4   | 10.4213 |  21.9245  | 57.3436  |        57.3389         |
|              hf_Bart              |  4   | 10.6383 |  18.4887  | 55.9197  |        55.6229         |
|       functorch_dp_cifar10        |  64  | 1.1967  |  2.4638   | 55.0087  |        54.2303         |
|          resnext50_32x4d          |  8   | 3.2837  |  7.1349   |  51.933  |        48.4457         |
|      timm_vision_transformer      |  32  | 3.3991  |  7.4335   | 48.1267  |        46.8145         |
|               hf_T5               |  8   | 5.7431  |  13.1935  | 46.8091  |        45.7595         |
|           fastNLP_Bert            |  6   | 5.1481  |  11.3576  | 46.2607  |        45.0826         |
|          LearningToPaint          |  96  |  1.415  |  2.9921   | 44.8102  |        43.9082         |
|          pytorch_stargan          |  16  | 1.1994  |  3.3148   | 44.7038  |        42.9311         |
|            hf_Reformer            |  4   | 4.1464  |   6.191   | 43.2119  |        40.1678         |
|             resnet18              |  16  | 1.3596  |  2.9808   | 40.8732  |        42.0388         |
|            Super_SloMo            |  6   | 2.7714  |  9.8896   | 40.2082  |        39.9696         |
|              hf_GPT2              |  4   | 4.7502  |  9.8937   | 38.7308  |        39.3271         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2392  |  3.0251   | 35.9972  |        35.0675         |
|             hf_Albert             |  8   | 2.6347  |   8.182   | 35.3612  |        36.4663         |
|              hf_Bert              |  4   | 5.1326  |  10.9768  | 34.7274  |        35.8901         |
|          phlippe_resnet           | 128  |  1.356  |  2.9166   | 31.1478  |        30.8586         |
|              demucs               |  4   | 1.4419  |  2.2182   | 29.7452  |        29.3646         |
|           hf_DistilBert           |  8   | 2.4361  |  5.4635   | 28.0342  |        28.6542         |
|           squeezenet1_1           |  32  | 1.0683  |  1.8075   | 24.3962  |        24.0074         |
|          pytorch_struct           | 200  | 0.7458  |  1.3623   | 19.0338  |        18.7421         |
|               vgg16               |  64  | 0.6483  |  1.1622   | 15.5263  |        14.9816         |
|              alexnet              | 128  | 0.4921  |   0.788   | 15.1562  |         13.655         |
|      nvidia_deeprecommender       | 256  | 0.4884  |  0.7768   |  9.4968  |         8.973          |
|                drq                |  1   | 0.6543  |  1.0261   |  9.0949  |         9.6644         |
|         soft_actor_critic         | 256  | 0.4265  |  0.6077   |  7.7173  |         7.0876         |
|               dcgan               |  32  | 0.4519  |  0.7216   |  7.3148  |         7.6676         |
|               dlrm                | 1024 | 0.3839  |  0.7954   |  7.121   |         7.3102         |
|           lennard_jones           | 1000 | 0.3971  |  0.6141   |   6.02   |         5.6438         |
|            tts_angular            |  64  | 0.4593  |  0.5204   |  5.5622  |         5.6212         |
|               moco                |  32  | 27.4656 |    nan    |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  |  9.601  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2037         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9859 |  0.7658   |  1.0102  |         1.1021         |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  0.9852  |         0.9957         |
|            timm_nfnet             | 128  | 0.9071 |  0.8748   |  0.9695  |         1.0734         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.9425  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9319  |         1.0718         |
|         timm_efficientnet         |  32  | 0.9843 |  0.8193   |  0.9277  |         1.0053         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8906  |         1.1284         |
|              yolov3               |  16  | 0.9837 |  0.8287   |  0.8712  |         1.036          |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.869          |
|           timm_resnest            |  32  | 0.9888 |  0.8984   |  0.8628  |         0.9519         |
|        shufflenet_v2_x1_0         | 128  | 0.9563 |  0.8383   |  0.8618  |         0.9647         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|             resnet152             |  32  | 0.9996 |   0.894   |  0.8504  |         0.9402         |
|            timm_regnet            |  32  | 0.9949 |  0.8507   |  0.8489  |         0.9504         |
|        Background_Matting         |  4   | 1.0132 |  0.6486   |  0.8484  |         1.0412         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9479         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|        mobilenet_v3_large         |  32  | 0.9782 |  0.8766   |  0.7848  |         0.8717         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9922 |  0.8592   |  0.7821  |         0.885          |
|           squeezenet1_1           |  32  | 0.9666 |  0.9291   |  0.773   |         0.9087         |
|              demucs               |  4   | 0.966  |  0.9659   |  0.773   |         0.9662         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.7535  |         0.9285         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|            mnasnet1_0             |  32  | 0.9772 |  0.8641   |  0.7429  |         0.8047         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7274  |         0.7358         |
|               vgg16               |  64  | 0.9919 |  0.7243   |  0.7227  |         0.9805         |
|              alexnet              | 128  | 0.9455 |   0.793   |  0.7088  |         0.9385         |
|            densenet121            |  4   | 0.9944 |  0.9822   |  0.7085  |         0.803          |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6971  |         1.1013         |
|          resnext50_32x4d          |  8   | 0.9925 |  0.8455   |  0.6655  |         0.7713         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9202 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8796   |  0.5904  |         0.6004         |
|             resnet18              |  16  | 0.9751 |  0.7804   |  0.5423  |         0.6127         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|           hf_Longformer           |  2   | 0.8565 |  0.8295   |  0.417   |         0.8947         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |          nan           |
|               moco                |  32  | 0.9949 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.5916 | 215.0379  | 125.9387 |        121.5672        |
|        Background_Matting         |  4   | 125.7455 | 919.7037  | 103.7404 |        104.014         |
|            hf_T5_large            |  2   | 228.7314 | 276.2003  | 101.5562 |        118.7436        |
|               hf_T5               |  8   | 181.8486 | 210.9469  | 94.4958  |        92.2844         |
|           hf_Longformer           |  2   | 138.5237 | 201.8588  | 79.0041  |        96.1161         |
|            timm_nfnet             | 128  | 120.045  | 119.8796  | 78.1518  |         81.333         |
|            hf_BigBird             |  2   | 198.7612 | 258.6282  | 76.4348  |        128.4768        |
|            hf_Reformer            |  4   | 81.9588  |  84.4875  | 71.0698  |        76.3802         |
|            Super_SloMo            |  6   | 79.6264  | 443.2713  | 64.4724  |        64.4085         |
|              yolov3               |  16  | 68.7622  |  84.8057  | 57.7228  |        57.6802         |
|            timm_regnet            |  32  | 61.5185  |  72.6249  | 56.0954  |        57.7333         |
|               vgg16               |  64  | 66.1914  |  66.2845  | 53.4744  |        52.7716         |
|             resnet152             |  32  | 64.5299  |  85.3505  | 52.7188  |         62.208         |
|           hf_Bert_large           |  4   | 82.6689  |  96.9009  | 52.0116  |        53.2847         |
|              demucs               |  4   | 53.7171  |  53.5349  | 51.6886  |        51.8105         |
| attention_is_all_you_need_pytorch | 256  | 55.6171  |  61.6077  | 37.2158  |        39.1697         |
|        speech_transformer         |  32  | 65.0803  |  75.8592  |  35.897  |        36.0555         |
|              hf_Bart              |  4   | 64.4966  |  74.6887  | 34.6191  |        44.1752         |
|           fastNLP_Bert            |  6   | 57.2223  |  62.5645  | 34.3652  |        34.7191         |
|           mobilenet_v2            |  96  | 47.0456  |  60.3026  | 31.1162  |         31.61          |
|             hf_Albert             |  8   | 70.1839  |  71.4967  |  29.722  |        30.4511         |
|           pytorch_unet            |  1   | 39.8943  | 194.0953  | 28.9892  |        29.0567         |
|              hf_GPT2              |  4   |  49.793  |  50.4975  |  27.773  |        27.6113         |
|            timm_vovnet            |  32  | 29.0386  |  34.931   | 26.2161  |        26.5473         |
|              hf_Bert              |  4   | 40.4846  |  49.3136  | 22.7764  |        26.2321         |
|         timm_efficientnet         |  32  | 34.5738  |  51.8181  | 22.2664  |        29.8665         |
|             resnet50              |  32  |  26.428  |  34.4866  | 22.0568  |        25.4223         |
|           hf_DistilBert           |  8   | 32.4992  |  33.3132  | 21.8742  |        21.9584         |
|            densenet121            |  4   | 55.5726  |  75.8345  |  19.645  |        50.4358         |
|        shufflenet_v2_x1_0         | 128  | 31.0988  |  41.2113  | 18.8118  |        25.2756         |
|      timm_vision_transformer      |  32  |  29.818  |  34.353   |  18.266  |        20.4823         |
|           BERT_pytorch            |  16  | 55.6702  |  69.5133  | 17.7568  |        25.8091         |
|           timm_resnest            |  32  | 24.2172  |  28.4158  | 15.4584  |        15.9292         |
|            mnasnet1_0             |  32  | 22.4918  |  30.4607  | 12.9843  |        20.5451         |
|        mobilenet_v3_large         |  32  | 29.8664  |  34.8299  | 12.9715  |        22.8999         |
|          resnext50_32x4d          |  8   | 24.5355  |  28.6952  | 11.7146  |        22.9469         |
|      nvidia_deeprecommender       | 256  | 10.2268  |  10.2211  | 11.7031  |        10.0214         |
|          pytorch_stargan          |  16  | 14.6901  |  19.3567  | 11.5922  |        12.1993         |
|         phlippe_densenet          | 128  | 23.3243  |  30.291   | 11.5787  |        23.7301         |
|              alexnet              | 128  |   9.83   |  9.8281   |  9.0189  |         8.6467         |
|          LearningToPaint          |  96  | 11.4611  |  14.8093  |  8.567   |        10.5089         |
|            tts_angular            |  64  |  6.7735  |  6.9581   |  6.6052  |         6.5166         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.4018  |  18.2312  |  5.9161  |         7.9185         |
|             resnet18              |  16  |  9.4208  |  14.2758  |   5.73   |        10.2596         |
|           squeezenet1_1           |  32  | 10.5376  |  11.1483  |  5.5217  |         7.7118         |
|          phlippe_resnet           | 128  |  9.0965  |  11.9484  |  4.9588  |         9.0942         |
|          pytorch_struct           | 200  |  5.0963  |  6.1283   |  3.1402  |         4.3855         |
|       functorch_dp_cifar10        |  64  | 10.3106  |  11.1961  |  2.8502  |         7.5909         |
|                drq                |  1   |  3.3813  |   4.403   |  2.158   |         3.4494         |
|               dlrm                | 1024 |  4.3996  |  4.9009   |  2.1304  |         3.9444         |
|         soft_actor_critic         | 256  |  1.7369  |  2.4038   |   1.93   |         2.1488         |
|               dcgan               |  32  |  2.4232  |  2.9962   |   1.44   |         2.6224         |
|           lennard_jones           | 1000 |  1.8118  |  2.1388   |  1.1554  |          1.77          |
|   timm_vision_transformer_large   |  32  | 465.8204 |    nan    |   nan    |          nan           |
|               moco                |  32  | 50.1594  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 0.9526 |  0.8024   |  2.484   |         1.0902         |
|             OPTForCausalLM              |  2  | 0.9884 |  0.9347   |  2.4275  |         2.4771         |
|      GPT2ForSequenceClassification      |  4  | 0.977  |  0.9507   |  2.2388  |         2.272          |
|     MobileBertForQuestionAnswering      | 128 | 0.953  |  0.8144   |  2.2084  |         1.0772         |
|       MT5ForConditionalGeneration       | 16  | 0.9911 |  0.8365   |  2.1251  |         1.8468         |
|             XGLMForCausalLM             |  8  | 0.9554 |  0.7375   |  2.1167  |         1.1812         |
|       ElectraForQuestionAnswering       | 64  | 0.987  |  0.9761   |  2.1142  |         2.0867         |
|            XLNetLMHeadModel             |  8  | 0.9956 |  0.9673   |  1.8227  |         1.8228         |
|           ElectraForCausalLM            | 32  | 0.9829 |  0.9337   |  1.804   |         1.8357         |
|    LayoutLMForSequenceClassification    | 16  | 0.9842 |  0.9703   |  1.7788  |         1.7826         |
|       RobertaForQuestionAnswering       | 16  | 0.9849 |  0.9698   |  1.7764  |         1.7565         |
|        BertForQuestionAnswering         | 16  | 0.9845 |   0.968   |  1.7673  |         1.7524         |
|     M2M100ForConditionalGeneration      | 16  | 1.0248 |  0.8135   |  1.7016  |         1.3632         |
|           RobertaForCausalLM            | 16  | 0.9872 |  0.9626   |  1.6679  |         1.6651         |
|               DistillGPT2               | 16  | 0.988  |  0.9553   |  1.6516  |         1.6938         |
|       AlbertForQuestionAnswering        |  4  | 0.9999 |  0.8855   |  1.6474  |         1.6403         |
|            AlbertForMaskedLM            |  4  | 0.9998 |   0.885   |  1.6287  |         1.633          |
|                 T5Small                 |  4  | 0.9767 |  0.8604   |  1.6236  |         1.7242         |
|       T5ForConditionalGeneration        |  4  | 0.9793 |  0.8521   |  1.6229  |         1.7283         |
|     PLBartForConditionalGeneration      |  4  | 0.9859 |  0.9505   |  1.6195  |         1.634          |
|            PLBartForCausalLM            |  8  |  0.99  |  0.9611   |  1.611   |         1.6712         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9803 |  0.9593   |  1.6052  |         1.6265         |
|             BertForMaskedLM             | 16  | 0.9865 |  0.9601   |  1.5988  |         1.5879         |
|          AllenaiLongformerBase          |  4  | 0.8866 |  0.6246   |  1.5956  |         1.4974         |
|           LayoutLMForMaskedLM           | 16  | 0.986  |  0.9611   |  1.581   |         1.5879         |
|                CamemBert                | 16  | 0.9872 |  0.9627   |  1.5455  |         1.5352         |
|      BartForConditionalGeneration       |  2  | 0.9987 |   0.971   |  1.5023  |         1.4824         |
|             BartForCausalLM             |  4  | 0.9854 |  0.9641   |  1.4905  |         1.5358         |
|            MBartForCausalLM             |  4  | 0.9889 |  0.9629   |   1.49   |         1.539          |
|            YituTechConvBert             | 16  | 0.9861 |  0.9543   |  1.4884  |         1.4927         |
|         Speech2Text2ForCausalLM         | 256 | 0.9756 |  0.9072   |  1.4737  |         1.5447         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9957 |  0.8973   |  1.4697  |         1.4104         |
|         MegatronBertForCausalLM         |  4  | 0.992  |  0.9039   |  1.4668  |         1.4928         |
|     DistilBertForQuestionAnswering      | 256 | 0.9937 |  0.9868   |  1.4407  |         1.438          |
|      MBartForConditionalGeneration      |  2  | 1.0017 |  0.9707   |  1.4378  |         1.4682         |
|     PegasusForConditionalGeneration     | 32  | 0.9975 |  0.9294   |  1.3436  |         1.3058         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9576 |  0.8857   |  1.2835  |         1.1955         |
|            TrOCRForCausalLM             | 32  | 0.9889 |  0.9625   |  1.2421  |         1.2863         |
|          DistilBertForMaskedLM          | 128 | 0.9919 |  0.9501   |  1.2189  |         1.2495         |
|           PegasusForCausalLM            | 32  | 0.9499 |  0.8961   |  1.1823  |         1.1568         |
|       DebertaForQuestionAnswering       |  8  | 0.8061 |  0.6973   |  1.0047  |         0.9128         |
|           DebertaForMaskedLM            |  4  | 0.7246 |  0.5646   |  0.9421  |         0.7781         |
|          DebertaV2ForMaskedLM           |  1  | 0.6979 |  0.5205   |  0.8378  |         0.6205         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6984 |  0.5199   |  0.7937  |         0.6324         |
|          BlenderbotForCausalLM          |  4  | 0.9187 |  0.7461   |   0.0    |         1.0907         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          AllenaiLongformerBase          |  4  | 11.448  |  32.3745  | 142.2008 |        111.0442        |
|          MobileBertForMaskedLM          | 64  | 17.7935 |  40.8913  | 134.854  |        132.5278        |
|     MobileBertForQuestionAnswering      | 128 | 17.7481 |  42.3438  | 129.2315 |        126.5694        |
|          DebertaV2ForMaskedLM           |  1  | 15.607  |  27.3995  | 127.3365 |        61.8771         |
|       MT5ForConditionalGeneration       | 16  | 8.4604  |  19.1631  | 126.9394 |        125.6958        |
|      DebertaV2ForQuestionAnswering      |  2  | 15.5141 |  27.0718  | 125.1719 |        61.0171         |
|     M2M100ForConditionalGeneration      | 16  | 11.9899 |  25.7787  | 107.1557 |        100.4359        |
|            XLNetLMHeadModel             |  8  | 10.3862 |  27.2419  | 84.1307  |        83.4905         |
|           DebertaForMaskedLM            |  4  | 7.3625  |  13.8083  | 78.6463  |        50.0475         |
|       DebertaForQuestionAnswering       |  8  | 7.2381  |  13.3878  | 76.8993  |        47.4669         |
|             XGLMForCausalLM             |  8  | 9.6397  |  21.4225  | 72.7024  |        64.5995         |
|            YituTechConvBert             | 16  | 11.1833 |  19.2977  | 72.6696  |        69.9732         |
|      MBartForConditionalGeneration      |  2  | 11.7672 |  26.5147  | 72.0903  |        70.4773         |
|     PegasusForConditionalGeneration     | 32  |  5.225  |  19.7266  | 68.4851  |        66.2897         |
|      BartForConditionalGeneration       |  2  | 11.9288 |  26.2936  | 66.4153  |        65.7497         |
|           ElectraForCausalLM            | 32  | 8.0184  |  13.556   | 64.0641  |        59.0958         |
|         MegatronBertForCausalLM         |  4  | 10.9254 |  22.0404  | 60.6282  |        59.2681         |
|    MegatronBertForQuestionAnswering     |  8  | 10.7407 |  21.9005  | 59.9533  |        58.0885         |
|     PLBartForConditionalGeneration      |  4  |  9.282  |  17.273   | 56.7964  |        54.7947         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.6977  |  17.4479  | 48.2886  |        48.8529         |
|       T5ForConditionalGeneration        |  4  | 5.8462  |  13.6855  | 46.4953  |        45.8947         |
|                 T5Small                 |  4  | 5.8436  |  13.4299  | 46.1413  |        45.4371         |
|            MBartForCausalLM             |  4  | 6.5654  |  12.0409  | 45.4385  |        41.1544         |
|             BartForCausalLM             |  4  | 6.6736  |  11.9921  | 44.9726  |        42.0284         |
|           PegasusForCausalLM            | 32  | 6.3791  |  11.6169  | 44.0005  |        41.8487         |
|            TrOCRForCausalLM             | 32  | 6.5777  |  12.232   | 43.9521  |         39.166         |
|    LayoutLMForSequenceClassification    | 16  | 5.8711  |  11.2651  |  43.144  |        43.2003         |
|       ElectraForQuestionAnswering       | 64  | 5.5429  |  10.8909  | 41.2498  |        39.7205         |
|             OPTForCausalLM              |  2  | 5.6112  |  11.2067  | 40.0571  |        36.8663         |
|           LayoutLMForMaskedLM           | 16  | 5.9643  |  11.4222  | 37.9778  |        35.8518         |
|       BlenderbotSmallForCausalLM        | 64  | 4.8029  |  8.4011   |  36.469  |        34.3494         |
|             BertForMaskedLM             | 16  | 5.4943  |  11.1224  | 36.3946  |        35.7824         |
|        BertForQuestionAnswering         | 16  | 5.4628  |   11.05   | 36.3269  |        35.8471         |
|            AlbertForMaskedLM            |  4  | 2.4453  |  8.3849   | 34.6125  |        34.8887         |
|                CamemBert                | 16  | 5.2924  |  10.9098  | 34.0135  |        33.4499         |
|     DistilBertForQuestionAnswering      | 256 | 2.6698  |  5.4972   |  33.48   |        33.3992         |
|           RobertaForCausalLM            | 16  | 5.4899  |  11.4538  | 33.2658  |        32.9969         |
|         Speech2Text2ForCausalLM         | 256 | 3.5137  |  6.1621   | 33.0258  |         30.515         |
|       RobertaForQuestionAnswering       | 16  |  5.48   |  11.4485  | 32.6052  |        31.7771         |
|      GPT2ForSequenceClassification      |  4  | 4.8646  |  9.9603   | 32.2597  |        31.5687         |
|            PLBartForCausalLM            |  8  | 3.7174  |  6.8132   | 32.0201  |        30.8791         |
|          DistilBertForMaskedLM          | 128 | 2.6659  |  5.5652   | 31.8439  |        32.6234         |
|       AlbertForQuestionAnswering        |  4  |  2.424  |  8.1784   | 31.5177  |         31.08          |
|               DistillGPT2               | 16  | 2.5589  |  5.1885   | 26.6865  |        27.0946         |
|          BlenderbotForCausalLM          |  4  | 11.7048 |  22.7375  |   nan    |         64.483         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9252   |  1.062   |         1.1099         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|            YituTechConvBert             | 16  | 0.953  |  0.8749   |  0.9793  |         0.9906         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8787   |  0.9563  |         0.9847         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0019         |
|            PLBartForCausalLM            |  8  | 0.9237 |  0.8182   |  0.8907  |         0.9249         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8966   |  0.8901  |         1.0074         |
|           ElectraForCausalLM            | 32  | 0.9161 |   0.786   |  0.889   |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|            TrOCRForCausalLM             | 32  |  0.92  |   0.829   |  0.8619  |         0.9075         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8924   |  0.8491  |         0.9507         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|             BartForCausalLM             |  4  | 0.951  |  0.8923   |  0.8301  |         0.943          |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.8065  |         0.8318         |
|           PegasusForCausalLM            | 32  | 0.9257 |  0.8421   |  0.7952  |         0.9252         |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7573   |  0.7566  |         0.808          |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |  0.6744  |         0.9287         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.6058  |         0.8978         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9764   |  0.487   |         0.9801         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |  0.4688  |         0.8742         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1527         |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |   nan    |         0.9941         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.1151 | 300.4135  | 163.7746 |        163.1434        |
|       AlbertForQuestionAnswering        |  4  | 263.8077 | 297.9072  | 160.2983 |        160.9655        |
|            XLNetLMHeadModel             |  8  | 280.6953 | 289.3458  | 153.3293 |        153.4202        |
|      DebertaV2ForQuestionAnswering      |  2  | 154.7076 | 206.1422  | 136.2692 |        170.9564        |
|          DebertaV2ForMaskedLM           |  1  | 151.9488 |  199.032  | 124.959  |        165.4408        |
|          AllenaiLongformerBase          |  4  | 204.6054 | 291.6998  | 113.677  |        120.9614        |
|     PegasusForConditionalGeneration     | 32  |  158.96  | 150.5112  | 111.7684 |        115.9461        |
|            TrOCRForCausalLM             | 32  | 139.4924 |  142.602  | 110.9453 |        107.1104        |
|      BartForConditionalGeneration       |  2  | 150.2012 | 141.2427  | 95.9401  |        92.5828         |
|      MBartForConditionalGeneration      |  2  | 140.5504 | 141.4781  | 95.1818  |        107.2434        |
|    MegatronBertForQuestionAnswering     |  8  | 144.8335 | 147.7972  |  88.376  |        87.2424         |
|            YituTechConvBert             | 16  | 127.7657 | 131.0341  | 84.1132  |        84.0152         |
| BlenderbotSmallForConditionalGeneration | 64  | 121.2818 | 125.6679  | 81.9355  |        79.1308         |
|     MobileBertForQuestionAnswering      | 128 | 201.2513 | 234.9634  | 81.3782  |        162.0538        |
|                CamemBert                | 16  | 119.8269 | 122.7617  | 76.6672  |        77.0745         |
|            MBartForCausalLM             |  4  | 114.6831 | 117.6247  | 76.2499  |        73.6245         |
|             BartForCausalLM             |  4  | 116.0333 | 117.4585  | 76.1678  |        73.9775         |
|       DebertaForQuestionAnswering       |  8  | 93.9921  | 108.5113  | 75.3886  |        83.0731         |
|     PLBartForConditionalGeneration      |  4  | 120.6193 | 122.5882  | 73.4586  |        72.7655         |
|     M2M100ForConditionalGeneration      | 16  | 134.5054 |  148.059  | 73.3326  |        80.1913         |
|          MobileBertForMaskedLM          | 64  | 203.6006 | 217.8483  | 72.4355  |        165.2378        |
|     DistilBertForQuestionAnswering      | 256 | 104.3065 | 104.6855  | 71.8079  |        71.9221         |
|           LayoutLMForMaskedLM           | 16  | 114.3026 | 117.0716  | 71.4101  |        70.9083         |
|            PLBartForCausalLM            |  8  | 112.9862 | 119.6693  | 71.1498  |        69.3729         |
|             OPTForCausalLM              |  2  | 172.5508 | 180.3912  | 70.1079  |        68.2055         |
|          DistilBertForMaskedLM          | 128 | 85.2524  |  89.1108  | 69.2931  |        68.1335         |
|           DebertaForMaskedLM            |  4  | 85.0467  | 112.7746  | 68.8985  |        80.9362         |
|           RobertaForCausalLM            | 16  | 116.8354 | 119.6262  | 68.8522  |        68.9651         |
|             BertForMaskedLM             | 16  | 111.7537 | 114.4018  | 68.6871  |        69.1265         |
|                 T5Small                 |  4  | 107.1798 | 123.9263  | 64.3937  |        60.2274         |
|       T5ForConditionalGeneration        |  4  | 107.2398 | 123.0375  | 64.2983  |        60.2604         |
|               DistillGPT2               | 16  | 106.9084 | 110.5231  | 63.9114  |        62.3487         |
|         MegatronBertForCausalLM         |  4  | 88.7654  |  96.6014  | 59.4302  |        57.9931         |
|           PegasusForCausalLM            | 32  | 74.8484  |  76.9859  | 59.3518  |        62.9176         |
|             XGLMForCausalLM             |  8  | 114.4047 | 158.9062  | 55.2578  |        98.0185         |
|    LayoutLMForSequenceClassification    | 16  | 99.3729  | 100.6541  | 54.8928  |        54.8886         |
|       ElectraForQuestionAnswering       | 64  | 116.5034 | 117.2662  | 54.3644  |         54.918         |
|       RobertaForQuestionAnswering       | 16  | 97.5222  |  98.7038  | 53.7905  |        54.3884         |
|        BertForQuestionAnswering         | 16  | 97.0953  |  98.3407  | 53.7898  |        54.2738         |
|           ElectraForCausalLM            | 32  | 91.5606  |  94.1099  | 48.9355  |        47.9112         |
|       BlenderbotSmallForCausalLM        | 64  | 64.4125  |  65.7095  |  47.927  |        48.3719         |
|       MT5ForConditionalGeneration       | 16  | 104.0472 | 113.2251  | 44.1673  |        50.2528         |
|      GPT2ForSequenceClassification      |  4  | 93.7099  |  96.0402  | 40.8128  |        40.2603         |
|         Speech2Text2ForCausalLM         | 256 | 55.1464  |  59.0891  | 36.3953  |        34.7045         |
|          BlenderbotForCausalLM          |  4  | 127.2504 | 155.8348  |   nan    |        99.0727         |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9985 |   0.997   |  3.0058  |         2.9718         |
|        twins_pcpvt_base         | 64  | 0.9933 |  0.9085   |  2.1369  |         1.6798         |
|      xcit_large_24_p8_224       |  5  | 0.9913 |  0.8674   |  1.9776  |         1.5668         |
|         coat_lite_mini          | 128 | 0.9973 |  0.9953   |  1.938   |         1.9135         |
|          ghostnet_100           | 128 | 0.992  |  0.7659   |  1.8276  |         1.6016         |
|          gmlp_s16_224           | 128 | 0.9944 |  1.0823   |  1.7916  |         1.785          |
|          gmixer_24_224          | 128 | 0.9962 |  0.8887   |  1.7414  |         1.7291         |
|            lcnet_050            | 128 | 0.9391 |  0.7346   |  1.6838  |         1.4332         |
|           volo_d1_224           | 64  | 0.9944 |  0.9726   |  1.6831  |         1.6635         |
|         crossvit_9_240          | 128 | 0.9903 |  0.7825   |  1.6238  |         1.5982         |
|  swin_base_patch4_window7_224   | 64  | 0.9905 |  0.9539   |  1.6051  |         1.6007         |
|           convit_base           | 64  | 0.9981 |  0.9976   |  1.5503  |         1.5522         |
|             dla102              | 128 | 0.9957 |  0.8151   |  1.5251  |         1.5232         |
|       gluon_inception_v3        | 128 | 0.9964 |  0.8643   |  1.5089  |         1.4996         |
|          inception_v3           | 128 | 0.9963 |  0.8644   |  1.5075  |         1.4992         |
|        adv_inception_v3         | 128 | 0.9964 |  0.8603   |  1.5073  |         1.4972         |
|        sebotnet33ts_256         | 64  | 0.9578 |  0.7643   |  1.5015  |         1.5245         |
|            nfnet_l0             | 128 | 0.9892 |  0.8132   |  1.4873  |         1.4307         |
|          convnext_base          | 64  | 0.9836 |  0.9843   |  1.4871  |         1.4689         |
|           regnety_002           | 128 | 0.9484 |  0.7097   |  1.4741  |         1.2104         |
|           dm_nfnet_f0           | 128 | 0.9866 |  0.9849   |  1.4611  |         1.4114         |
|            pit_b_224            | 64  | 0.9946 |  0.9927   |  1.4302  |         1.4236         |
|       eca_botnext26ts_256       | 128 | 0.9726 |  0.7191   |  1.4245  |         1.4092         |
|      mobilenetv3_large_100      | 128 | 0.9493 |  0.7599   |  1.4212  |         1.3979         |
|           mnasnet_100           | 128 | 0.9483 |  0.7408   |  1.4197  |         1.4809         |
|           mobilevit_s           | 64  | 0.9615 |  0.7246   |  1.4156  |         1.4282         |
|           resnest101e           | 64  | 0.994  |  0.8682   |  1.4092  |         1.3437         |
|           selecsls42b           | 128 | 0.9981 |  0.8114   |  1.4064  |         1.4046         |
|          botnet26t_256          | 128 | 0.9735 |  0.8509   |  1.3888  |         1.4048         |
|        res2net50_14w_8s         | 128 | 0.9989 |   0.791   |  1.3765  |         1.3544         |
|         mobilenetv2_100         | 128 | 0.9495 |  0.7371   |  1.3746  |         1.432          |
|          jx_nest_base           | 32  | 0.9874 |  0.9839   |  1.3673  |         1.3599         |
|           res2next50            | 128 | 0.9992 |  0.8258   |  1.3672  |         1.3616         |
|          mixer_b16_224          | 128 | 0.9975 |  1.0184   |   1.36   |         1.3613         |
|       tf_efficientnet_b0        | 128 | 0.9589 |  0.6812   |  1.3533  |         1.3878         |
|            hrnet_w18            | 128 | 0.9926 |  0.6449   |   1.35   |         1.3393         |
|          spnasnet_100           | 128 | 0.9419 |  0.7391   |  1.3455  |         1.4099         |
|          cait_m36_384           |  4  | 0.9948 |  0.9929   |  1.3452  |         1.3459         |
|      beit_base_patch16_224      | 64  | 0.9967 |   0.966   |  1.3427  |         1.3419         |
|           fbnetc_100            | 128 | 0.9498 |  0.7392   |  1.3403  |         1.3925         |
|        ese_vovnet19b_dw         | 128 | 0.9565 |  0.8329   |  1.3366  |         1.3549         |
|         poolformer_m36          | 64  | 0.9866 |  0.9832   |  1.3278  |         1.3165         |
|            fbnetv3_b            | 128 | 0.9478 |  0.7689   |  1.2963  |         1.2825         |
|           rexnet_100            | 128 | 0.9513 |  0.7027   |  1.2857  |         1.3227         |
| deit_base_distilled_patch16_224 | 64  | 0.9965 |  0.9942   |  1.2524  |          0.0           |
|          resmlp_12_224          | 128 | 0.9937 |  0.8885   |  1.2498  |         1.2488         |
|      vit_base_patch16_224       | 64  | 0.9964 |  0.9937   |  1.2325  |         1.2326         |
|            tinynet_a            | 128 | 0.947  |  0.6792   |  1.2224  |         1.2479         |
|          cspdarknet53           | 64  | 0.9311 |  0.7855   |  1.2122  |         1.2427         |
|           tf_mixnet_l           | 128 | 0.9763 |  0.8268   |  1.1804  |         1.1879         |
|         visformer_small         | 128 | 0.9963 |  0.9447   |  1.1734  |         1.1661         |
|            mixnet_l             | 128 | 0.9761 |  0.8207   |  1.171   |         1.1773         |
|        res2net101_26w_4s        | 64  | 0.9994 |  0.7943   |  1.1614  |         1.087          |
|             dpn107              | 32  | 0.9312 |  0.8074   |  1.0935  |         1.1372         |
|        gluon_xception65         | 32  | 0.9923 |  0.8428   |  1.0711  |         1.0748         |
|            repvgg_a2            | 128 | 0.9368 |  0.7544   |  1.0677  |         1.0971         |
|     swsl_resnext101_32x16d      | 32  | 0.9979 |   0.84    |  1.0606  |         1.0258         |
|            gernet_l             | 128 | 0.9358 |  0.7927   |  1.0308  |         1.0611         |
|        convmixer_768_32         | 32  | 0.9987 |  0.9638   |  1.0024  |         1.0031         |
|          pnasnet5large          | 16  | 0.9859 |  0.9177   |  0.9095  |         0.9191         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+---------------+------------------------+
|              name               | bs | eager |   aot_eager   |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+---------------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |     pass      |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |     pass      |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |     pass      |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |     pass      |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |     pass      |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |     pass      |          pass          |
|           regnety_002           | 8  | pass  |     pass      |     pass      |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |     pass      |          pass          |
|           res2next50            | 8  | pass  |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |     pass      |          pass          |
|           resnest101e           | 8  | pass  |     pass      |     pass      |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |     pass      |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |     pass      |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |     pass      |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |     pass      |          pass          |
|         visformer_small         | 8  | pass  |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |     pass      |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |     pass      |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |     pass      |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |     pass      |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |     pass      |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |     pass      |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |     pass      |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |     pass      |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |     pass      |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |     pass      |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |     pass      |          pass          |
|           convit_base           | 8  | pass  |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |     pass      |          pass          |
|          convnext_base          | 8  | pass  |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |     pass      |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |     pass      |          pass          |
|             dla102              | 8  | pass  |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |     pass      |          pass          |
|             dpn107              | 8  | pass  |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |     pass      |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |     pass      |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |     pass      |          pass          |
|            gernet_l             | 8  | pass  |     pass      |     pass      |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |     pass      |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |     pass      |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |     pass      |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |     pass      |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |     pass      |          pass          |
|          inception_v3           | 8  | pass  |     pass      |     pass      |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |     pass      |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |     pass      |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |  fail_to_run  |      fail_to_run       |
|        sebotnet33ts_256         | 8  | pass  |     pass      | fail_accuracy |          pass          |
+---------------------------------+----+-------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.6152  |  11.3032  | 280.4575 |        290.4619        |
|          ghostnet_100           | 128 | 7.4828  |  14.6976  | 231.6665 |        237.6595        |
|            hrnet_w18            | 128 | 9.5518  |  36.1766  | 229.157  |        229.0573        |
|            fbnetv3_b            | 128 | 8.3876  |  16.9606  | 164.734  |        161.2022        |
|      mobilenetv3_large_100      | 128 | 4.2311  |  8.4744   | 160.5862 |        154.5706        |
|           mobilevit_s           | 64  | 5.3334  |  11.6097  | 157.1846 |        157.2508        |
|           resnest101e           | 64  | 10.9981 |  24.6467  | 155.6736 |        158.2069        |
|       gluon_inception_v3        | 128 | 5.6181  |  12.5942  | 154.0854 |        150.6782        |
|            mixnet_l             | 128 | 8.5356  |  16.2982  | 153.9611 |        151.3775        |
|            tinynet_a            | 128 | 5.9143  |  12.1115  | 153.7764 |        157.1143        |
|           tf_mixnet_l           | 128 | 8.9917  |  16.6975  | 152.9577 |        155.4134        |
|          inception_v3           | 128 | 5.5698  |  12.6543  | 152.9547 |        152.9301        |
|        adv_inception_v3         | 128 | 5.6417  |  12.393   | 152.8151 |        152.7775        |
|          pnasnet5large          | 16  |  8.177  |  26.4463  | 151.9905 |        150.321         |
|       tf_efficientnet_b0        | 128 | 5.1143  |  10.3735  | 150.4692 |        152.681         |
|        res2net101_26w_4s        | 64  | 10.4349 |  24.9786  | 141.6157 |        140.684         |
|        twins_pcpvt_base         | 64  | 10.5227 |  23.341   | 139.696  |        140.1117        |
|          spnasnet_100           | 128 | 5.0219  |  9.3195   |  136.55  |        137.8384        |
|           fbnetc_100            | 128 | 5.0357  |  9.3876   | 135.3267 |        132.0306        |
|         mobilenetv2_100         | 128 | 3.9963  |  7.8785   | 124.479  |        125.2882        |
|      xcit_large_24_p8_224       |  5  | 12.5688 |  28.1726  | 121.4632 |        123.4044        |
|           mnasnet_100           | 128 | 4.0293  |  7.7389   | 117.999  |        115.8768        |
|        res2net50_14w_8s         | 128 | 8.8904  |  22.6359  | 115.4962 |        114.8824        |
|          cait_m36_384           |  4  | 13.7001 |  31.1917  | 106.2086 |        104.9699        |
|        sebotnet33ts_256         | 64  | 4.1884  |  8.9982   | 105.7799 |        104.5434        |
|  swin_base_patch4_window7_224   | 64  | 8.5131  |  19.2355  | 103.8326 |        101.332         |
|           regnety_002           | 128 | 4.9107  |  8.8521   | 103.2574 |        102.7341        |
|          cspdarknet53           | 64  | 6.0579  |  10.879   | 97.8097  |         95.669         |
|       eca_botnext26ts_256       | 128 | 3.1761  |  6.8165   | 96.8953  |         95.486         |
|            lcnet_050            | 128 | 2.5177  |  5.0655   | 95.9435  |        96.4555         |
|         poolformer_m36          | 64  | 7.5747  |  13.9154  | 94.7719  |        95.2403         |
|             dpn107              | 32  | 10.0964 |  19.2924  | 93.2665  |        90.5464         |
|             dla102              | 128 | 6.2108  |  14.0486  | 91.2362  |        90.9179         |
|        gluon_xception65         | 32  | 7.8134  |  16.7832  | 88.2068  |        86.7272         |
|           selecsls42b           | 128 | 2.4566  |  5.3642   | 87.7778  |         85.254         |
|          botnet26t_256          | 128 | 2.9542  |  5.8971   | 87.2804  |        88.1821         |
|         coat_lite_mini          | 128 | 3.2911  |  7.8433   | 86.1974  |        85.7796         |
|           res2next50            | 128 | 4.9888  |  12.1555  | 83.6023  |        81.7428         |
|         crossvit_9_240          | 128 | 5.8419  |  13.3165  | 83.1758  |        82.7479         |
|          jx_nest_base           | 32  | 6.6422  |  14.9115  | 79.7483  |        79.7375         |
|            gernet_l             | 128 | 4.9405  |  8.8594   |  77.482  |        78.7304         |
|        ese_vovnet19b_dw         | 128 | 2.5999  |  4.4908   | 75.7913  |        75.9008         |
|            nfnet_l0             | 128 | 5.2959  |  10.9913  |  75.634  |        73.5955         |
|           volo_d1_224           | 64  |  5.014  |  11.7132  | 70.0412  |        69.3864         |
|           dm_nfnet_f0           | 128 | 6.3169  |  11.6049  |  66.63   |        65.7598         |
|        tnt_s_patch16_224        | 128 | 6.4292  |  15.9828  | 64.5175  |        63.0427         |
|         visformer_small         | 128 | 2.6099  |  6.0311   | 63.6107  |        64.1551         |
|            repvgg_a2            | 128 | 4.7747  |  8.8333   | 59.3578  |        58.5428         |
|     swsl_resnext101_32x16d      | 32  | 6.3586  |  13.7211  | 56.7113  |        56.1975         |
|          gmlp_s16_224           | 128 | 5.6407  |  12.2056  | 56.0963  |        56.2116         |
|          convnext_base          | 64  | 6.8215  |  12.6299  |  54.203  |         54.808         |
|          gmixer_24_224          | 128 | 5.7325  |  12.814   | 46.7922  |        47.7031         |
|           convit_base           | 64  | 3.4502  |  8.5485   | 44.8174  |        45.5649         |
|            pit_b_224            | 64  | 3.5327  |  8.0601   | 42.7809  |        42.7487         |
|          resmlp_12_224          | 128 | 2.8031  |  5.5147   | 41.4394  |        39.0477         |
| deit_base_distilled_patch16_224 | 64  | 3.1459  |  6.9494   | 40.0535  |          nan           |
|      vit_base_patch16_224       | 64  | 3.0377  |  6.9941   | 39.7619  |        36.4559         |
|      beit_base_patch16_224      | 64  | 3.9831  |  8.5958   | 33.2805  |        33.3546         |
|        convmixer_768_32         | 32  | 1.7032  |  6.9195   | 32.5095  |        32.6941         |
|          mixer_b16_224          | 128 | 2.7098  |  5.9049   | 31.1904  |        31.1231         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.996  |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9635 |  0.9151   |  0.9536  |         1.0325         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |          nan           |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.8408 | 311.5469  | 299.2853 |        299.0773        |
|          pnasnet5large          | 16  | 198.9924 | 213.0501  | 216.2968 |        214.747         |
|            hrnet_w18            | 128 | 281.0274 | 432.2528  | 207.0261 |        209.1993        |
|           tf_mixnet_l           | 128 | 193.9071 |  229.064  | 160.3774 |        159.5393        |
|            mixnet_l             | 128 | 185.3007 |  220.524  | 154.5714 |        153.8335        |
|          cait_m36_384           |  4  | 168.0671 | 168.2617  | 123.9046 |        126.4956        |
|           resnest101e           | 64  | 164.3672 | 188.2158  | 115.9795 |        121.6936        |
|             dla102              | 128 | 172.5422 | 210.6101  | 112.5895 |        112.6824        |
|     swsl_resnext101_32x16d      | 32  | 118.4833 | 141.0868  |  111.78  |        115.2197        |
|         poolformer_m36          | 64  | 146.5672 | 147.3208  | 109.1316 |        109.8569        |
|        tnt_s_patch16_224        | 128 | 323.3604 | 323.8219  | 107.3867 |        108.6135        |
|       gluon_inception_v3        | 128 | 160.686  | 185.6529  | 106.2654 |        106.8362        |
|          inception_v3           | 128 | 160.5689 | 185.2325  | 106.186  |        106.7067        |
|        adv_inception_v3         | 128 | 160.791  | 185.9934  | 106.1341 |        106.9977        |
|           convit_base           | 64  | 163.0784 | 163.1448  | 105.1547 |        104.8331        |
|        res2net50_14w_8s         | 128 | 140.595  | 177.7787  | 102.0958 |        103.8942        |
|             dpn107              | 32  | 113.9933 | 131.1281  | 96.9057  |        93.3572         |
|        gluon_xception65         | 32  | 99.6492  | 117.1992  | 92.5538  |        91.9754         |
|           res2next50            | 128 | 125.9593 | 152.4191  | 92.1644  |        92.3582         |
|  swin_base_patch4_window7_224   | 64  | 147.7003 | 153.0989  | 91.0262  |        91.2498         |
|           dm_nfnet_f0           | 128 | 128.7794 |  128.897  | 86.8449  |        89.9589         |
|          mixer_b16_224          | 128 | 116.9027 | 114.0593  | 86.3205  |        85.3668         |
|        res2net101_26w_4s        | 64  | 98.5497  | 125.4996  | 85.7467  |        90.9616         |
|            fbnetv3_b            | 128 | 115.5215 |  142.27   | 84.3282  |        85.3069         |
|            pit_b_224            | 64  | 118.7454 | 118.7837  | 82.5135  |        82.8903         |
|          convnext_base          | 64  | 124.3203 | 123.8709  | 82.1647  |        83.1784         |
|         visformer_small         | 128 | 91.1765  |  96.1684  | 77.5161  |         77.846         |
|          gmlp_s16_224           | 128 | 137.5985 | 126.3942  | 76.5699  |        76.5131         |
|            nfnet_l0             | 128 | 113.0789 | 137.0204  | 75.3496  |        78.2048         |
|      beit_base_patch16_224      | 64  | 101.4123 | 104.5867  | 75.2898  |        75.3063         |
|       eca_botnext26ts_256       | 128 | 108.8952 | 147.1115  | 74.3014  |        75.0652         |
|          jx_nest_base           | 32  | 101.3037 | 101.8269  | 73.1383  |        73.5356         |
|          cspdarknet53           | 64  | 95.0659  | 112.7409  | 73.0176  |        71.2703         |
|           volo_d1_224           | 64  | 120.9774 | 123.8347  | 71.4217  |        72.2528         |
|          botnet26t_256          | 128 | 101.7069 | 116.5123  | 71.4075  |        70.5251         |
|            gernet_l             | 128 | 77.6493  |  91.7431  | 70.5595  |        68.5259         |
|      vit_base_patch16_224       | 64  | 86.7668  |  87.0268  | 70.4013  |        70.1838         |
|            repvgg_a2            | 128 | 77.4419  |  96.2444  | 68.0688  |        66.1781         |
| deit_base_distilled_patch16_224 | 64  | 85.0863  |  84.9596  | 67.5332  |          nan           |
|          gmixer_24_224          | 128 | 118.1297 | 132.1401  |  67.436  |        67.9527         |
|      xcit_large_24_p8_224       |  5  | 123.2511 | 144.4484  | 62.8431  |        77.8132         |
|        twins_pcpvt_base         | 64  | 118.5865 | 128.3091  |  60.645  |         67.845         |
|       tf_efficientnet_b0        | 128 | 85.0931  | 119.5361  | 60.1042  |        58.5914         |
|           rexnet_100            | 128 | 80.1309  |  108.29   | 59.2622  |        57.5477         |
|           fbnetc_100            | 128 | 82.8424  | 106.2779  | 58.7002  |        56.5437         |
|         coat_lite_mini          | 128 | 113.0634 | 113.1429  | 58.1899  |        58.8928         |
|           mobilevit_s           | 64  | 84.5599  |  112.501  | 57.4829  |        56.9674         |
|            tinynet_a            | 128 | 73.5961  | 102.3646  | 56.9206  |        55.7489         |
|        sebotnet33ts_256         | 64  | 80.3175  | 100.7125  | 51.2922  |        50.5474         |
|         crossvit_9_240          | 128 | 82.7489  | 104.3707  | 50.2437  |        51.0198         |
|          spnasnet_100           | 128 | 70.3937  |  89.8023  | 49.2491  |         47.061         |
|          ghostnet_100           | 128 | 90.7776  | 117.2955  | 49.1644  |        56.2198         |
|        ese_vovnet19b_dw         | 128 | 64.6661  |  74.303   | 46.3364  |        45.6744         |
|         mobilenetv2_100         | 128 | 65.3946  |  84.2477  |  45.186  |        43.3906         |
|           mnasnet_100           | 128 | 64.2493  |  82.4347  | 42.9251  |        41.1427         |
|           selecsls42b           | 128 | 60.0313  |  73.8337  | 42.5688  |        42.6868         |
|          resmlp_12_224          | 128 | 53.3437  |  59.8137  | 42.5044  |        42.5201         |
|      mobilenetv3_large_100      | 128 | 61.3052  |  76.7591  | 40.9231  |        41.6816         |
|           regnety_002           | 128 |  40.298  |  52.4767  | 26.6471  |        31.0344         |
|            lcnet_050            | 128 | 31.8302  |  40.7245  | 17.6885  |        20.8173         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/torchbench_amp.png :

bench_logs/timm_models_amp.png :

Build Summary

see more

Run name

day_083_24_03_23_performance_amp_938

Commit hashes

pytorch commit: c757647
pytorch commit date: 2023-03-25 01:36:30+00:00
torchbench commit: c2ef52a6f72829b77bbafbb7010bd16d8d15c916
torchbench commit date: 2023-03-24 13:58:11-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitc757647

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 87%, 52/60 | 93%, 42/45  | 98%, 59/60  |
| inductor_no_cudagraphs | 87%, 52/60 | 98%, 44/45  | 98%, 59/60  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.58x    |    1.58x    |    1.40x    |
| inductor_no_cudagraphs |   1.27x    |    1.48x    |    1.38x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.81    |    7.63     |    5.92     |
|       aot_eager        |    9.34    |    16.21    |    13.23    |
|        inductor        |   61.14    |    58.77    |   106.00    |
| inductor_no_cudagraphs |   59.48    |    54.27    |   105.13    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.79x    |    0.89x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.03x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Previous report name: /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Passrate diff

+------------------------+-------------+------------+------------+
|        compiler        |    suite    | prev_value | cur_value  |
+------------------------+-------------+------------+------------+
|        inductor        | torchbench  | 87%, 52/60 | 87%, 52/60 |
|        inductor        | huggingface | 93%, 42/45 | 93%, 42/45 |
|        inductor        | timm_models | 97%, 58/60 | 98%, 59/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60 | 87%, 52/60 |
| inductor_no_cudagraphs | huggingface | 98%, 44/45 | 98%, 44/45 |
| inductor_no_cudagraphs | timm_models | 98%, 59/60 | 98%, 59/60 |
+------------------------+-------------+------------+------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.56x    |   1.58x   |
|        inductor        | huggingface |   1.59x    |   1.58x   |
|        inductor        | timm_models |   1.40x    |   1.40x   |
| inductor_no_cudagraphs | torchbench  |   1.28x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.48x    |   1.48x   |
| inductor_no_cudagraphs | timm_models |   1.38x    |   1.38x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+---------------------------------+------------------------+-----------------+
|    suite    |              name               | inductor_no_cudagraphs |    inductor     |
+-------------+---------------------------------+------------------------+-----------------+
| torchbench  |              moco               |      fail_to_run       |   fail_to_run   |
| torchbench  |       Background_Matting        |    eager_variation     | eager_variation |
| torchbench  |         vision_maskrcnn         |    eager_variation     |     0.0000      |
| torchbench  |            tacotron2            |         0.0000         |     0.0000      |
| torchbench  |               gat               |         0.0000         |     0.0000      |
| torchbench  |               gcn               |         0.0000         |     0.0000      |
| torchbench  |              llama              |         0.0000         |     0.0000      |
| torchbench  |              sage               |         0.0000         |     0.0000      |
| torchbench  |          torchrec_dlrm          |         0.0000         |     0.0000      |
| huggingface |  DebertaV2ForQuestionAnswering  |          pass          |   fail_to_run   |
| huggingface |   AlbertForQuestionAnswering    |     fail_accuracy      |  fail_accuracy  |
| timm_models | deit_base_distilled_patch16_224 |          pass          |   fail_to_run   |
+-------------+---------------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+---------------------------------+------------------------+----------+
|    suite    |              name               | inductor_no_cudagraphs | inductor |
+-------------+---------------------------------+------------------------+----------+
| torchbench  |            resnet18             |         0.9443         |  1.5755  |
| torchbench  |               drq               |         0.9065         |  1.4893  |
| torchbench  |              dcgan              |         0.8249         |  1.4349  |
| torchbench  |          lennard_jones          |         0.8836         |  1.3074  |
| torchbench  |        soft_actor_critic        |         0.8245         |  1.1951  |
| torchbench  |           timm_vovnet           |         0.9208         |  0.9299  |
| torchbench  |     nvidia_deeprecommender      |         1.0188         |  0.873   |
| torchbench  |  timm_vision_transformer_large  |          0.0           |   0.0    |
| torchbench  |              moco               |          0.0           |   0.0    |
| torchbench  |               gat               |          0.0           |   0.0    |
| torchbench  |               gcn               |          0.0           |   0.0    |
| torchbench  |              sage               |          0.0           |   0.0    |
| torchbench  |            tacotron2            |          0.0           |   0.0    |
| torchbench  |          torchrec_dlrm          |          0.0           |   0.0    |
| huggingface |   DebertaForQuestionAnswering   |         0.9164         |  1.0247  |
| huggingface |       DebertaForMaskedLM        |          0.76          |  0.903   |
| huggingface |      DebertaV2ForMaskedLM       |         0.6187         |  0.8475  |
| huggingface |  DebertaV2ForQuestionAnswering  |         0.6271         |  0.7808  |
| huggingface |      BlenderbotForCausalLM      |         1.101          |   0.0    |
| timm_models |          pnasnet5large          |         0.9204         |  0.9101  |
| timm_models | deit_base_distilled_patch16_224 |          0.0           |   0.0    |
+-------------+---------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |        phlippe_densenet        |        166.042         | 165.025  |
| torchbench  |          hf_T5_large           |        155.443         | 158.3562 |
| torchbench  |         hf_Longformer          |        112.0315        | 143.6472 |
| torchbench  |       timm_efficientnet        |        139.8204        | 142.4904 |
| torchbench  |           hf_BigBird           |        114.5905        | 135.3274 |
| torchbench  |          densenet121           |        128.8748        | 130.2685 |
| torchbench  |       mobilenet_v3_large       |        134.2476        | 128.9372 |
| torchbench  |          mobilenet_v2          |        127.5357        | 122.6148 |
| huggingface |     AllenaiLongformerBase      |        111.0905        | 142.2876 |
| huggingface |     MobileBertForMaskedLM      |        133.427         | 130.8579 |
| huggingface |  MT5ForConditionalGeneration   |        125.9944        | 125.2247 |
| huggingface | MobileBertForQuestionAnswering |        127.0823        | 124.7928 |
| huggingface |      DebertaV2ForMaskedLM      |        61.0243         | 124.4777 |
| huggingface | DebertaV2ForQuestionAnswering  |        58.5909         | 123.6107 |
| timm_models |           rexnet_100           |        289.3288        | 279.2158 |
| timm_models |           hrnet_w18            |        227.7826        | 234.5558 |
| timm_models |          ghostnet_100          |         237.8          | 234.0273 |
| timm_models |           fbnetv3_b            |        167.8318        | 164.4178 |
| timm_models |          mobilevit_s           |        152.8017        | 160.2726 |
| timm_models |           tinynet_a            |        158.6931        | 158.1586 |
| timm_models |            mixnet_l            |        152.0009        | 157.4954 |
| timm_models |          resnest101e           |        155.6461        | 157.4631 |
| timm_models |       gluon_inception_v3       |        153.1525        | 157.2453 |
| timm_models |          inception_v3          |        154.5637        | 156.3229 |
| timm_models |          tf_mixnet_l           |        156.0582        | 155.7678 |
| timm_models |        adv_inception_v3        |        153.7598        | 155.0713 |
| timm_models |     mobilenetv3_large_100      |        151.4939        | 153.7169 |
| timm_models |       tf_efficientnet_b0       |        146.3716        | 152.9053 |
| timm_models |         pnasnet5large          |        150.2401        | 152.7828 |
| timm_models |       res2net101_26w_4s        |        140.4152        | 142.6738 |
| timm_models |        twins_pcpvt_base        |        139.4327        | 138.6304 |
| timm_models |           fbnetc_100           |        133.3645        | 137.668  |
| timm_models |          spnasnet_100          |        133.1085        | 133.0916 |
| timm_models |        mobilenetv2_100         |        125.3272        | 130.9183 |
| timm_models |      xcit_large_24_p8_224      |        118.7053        | 121.5096 |
| timm_models |          mnasnet_100           |        120.0813        | 118.447  |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |              hf_GPT2_large              |         1.1284         |  0.8906  |
| torchbench  |                 yolov3                  |         1.0375         |   0.87   |
| torchbench  |           speech_transformer            |         0.8682         |  0.8651  |
| torchbench  |           shufflenet_v2_x1_0            |         0.9653         |  0.8636  |
| torchbench  |              timm_resnest               |         0.9658         |  0.8604  |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8593  |
| torchbench  |               timm_regnet               |         0.9536         |  0.8484  |
| torchbench  |           Background_Matting            |         1.0412         |  0.8484  |
| torchbench  |                resnet152                |         0.9404         |  0.8479  |
| torchbench  |              hf_DistilBert              |         0.9479         |  0.8476  |
| torchbench  |               hf_T5_large               |         1.168          |  0.8201  |
| torchbench  |              pytorch_unet               |         0.9308         |  0.8134  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8058  |
| torchbench  |           mobilenet_v3_large            |         0.7768         |  0.7858  |
| torchbench  |                  dcgan                  |         0.9645         |  0.7821  |
| torchbench  |                resnet50                 |         0.8853         |  0.7813  |
| torchbench  |                 demucs                  |         0.9662         |  0.7733  |
| torchbench  |              squeezenet1_1              |         0.9087         |  0.773   |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.7715  |
| torchbench  |                 hf_Bart                 |         0.9285         |  0.7535  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.7529  |
| torchbench  |               mnasnet1_0                |         0.7749         |  0.7448  |
| torchbench  |             pytorch_struct              |         0.7358         |  0.7274  |
| torchbench  |                  vgg16                  |         0.9805         |  0.7227  |
| torchbench  |                 alexnet                 |         0.9385         |  0.7088  |
| torchbench  |               densenet121               |         0.8017         |  0.7061  |
| torchbench  |               hf_BigBird                |         1.1068         |  0.6971  |
| torchbench  |             resnext50_32x4d             |         0.7736         |  0.666   |
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.6585  |
| torchbench  |                   drq                   |         0.9573         |  0.6379  |
| torchbench  |            soft_actor_critic            |         0.9973         |  0.6066  |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6172         |  0.6065  |
| torchbench  |             LearningToPaint             |         0.7458         |  0.5925  |
| torchbench  |                resnet18                 |         0.6127         |  0.5423  |
| torchbench  |              lennard_jones              |         0.9997         |  0.5317  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.4538  |
| torchbench  |              hf_Longformer              |         0.8947         |  0.417   |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.3991  |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3169  |
| huggingface |            PLBartForCausalLM            |         0.9249         |  0.8907  |
| huggingface |     PegasusForConditionalGeneration     |         1.0074         |  0.8901  |
| huggingface |           ElectraForCausalLM            |         0.8941         |  0.889   |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8849  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8729  |
| huggingface |      MBartForConditionalGeneration      |         1.0307         |  0.8672  |
| huggingface |            TrOCRForCausalLM             |         0.9075         |  0.8619  |
| huggingface |            MBartForCausalLM             |         0.9507         |  0.8491  |
| huggingface |      BartForConditionalGeneration       |         1.0139         |  0.8456  |
| huggingface |         MegatronBertForCausalLM         |         1.0962         |  0.845   |
| huggingface |             BartForCausalLM             |         0.943          |  0.8301  |
| huggingface |       BlenderbotSmallForCausalLM        |         0.8318         |  0.8065  |
| huggingface |           PegasusForCausalLM            |         0.9252         |  0.7952  |
| huggingface |         Speech2Text2ForCausalLM         |         0.808          |  0.7566  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.7473  |
| huggingface |             XGLMForCausalLM             |         0.9287         |  0.6744  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6569  |
| huggingface |     M2M100ForConditionalGeneration      |         0.8978         |  0.6058  |
| huggingface |           DebertaForMaskedLM            |         0.9978         |  0.5501  |
| huggingface |          DebertaV2ForMaskedLM           |         0.9665         |  0.5197  |
| huggingface |      DebertaV2ForQuestionAnswering      |         0.9801         |  0.487   |
| huggingface |          AllenaiLongformerBase          |         0.8742         |  0.4688  |
| huggingface |       DebertaForQuestionAnswering       |         1.1527         |  0.4601  |
| timm_models |                hrnet_w18                |          0.99          |  0.8918  |
| timm_models |            sebotnet33ts_256             |         1.1115         |  0.891   |
| timm_models |           gluon_inception_v3            |         1.0171         |  0.8904  |
| timm_models |            adv_inception_v3             |         1.0171         |  0.8904  |
| timm_models |              inception_v3               |         1.0171         |  0.8904  |
| timm_models |                 dpn107                  |         0.9642         |  0.8833  |
| timm_models |            gluon_xception65             |         0.9705         |  0.8831  |
| timm_models |              ghostnet_100               |         0.977          |  0.8807  |
| timm_models |              spnasnet_100               |         0.9451         |  0.8786  |
| timm_models |          mobilenetv3_large_100          |         0.9361         |  0.877   |
| timm_models |             poolformer_m36              |         1.1871         |  0.8768  |
| timm_models |           eca_botnext26ts_256           |         1.0072         |  0.8738  |
| timm_models |          xcit_large_24_p8_224           |         0.9732         |  0.8721  |
| timm_models |            res2net50_14w_8s             |         0.9607         |  0.8712  |
| timm_models |            res2net101_26w_4s            |         0.9483         |  0.871   |
| timm_models |                mixnet_l                 |         0.9902         |  0.8687  |
| timm_models |               mnasnet_100               |         0.9403         |  0.8683  |
| timm_models |               res2next50                |         0.9547         |  0.866   |
| timm_models |              cait_m36_384               |         0.989          |  0.8632  |
| timm_models |               fbnetc_100                |         0.9535         |  0.8596  |
| timm_models |                pit_b_224                |         1.0242         |  0.8578  |
| timm_models |               selecsls42b               |         0.9664         |  0.8576  |
| timm_models |              convnext_base              |         1.0338         |  0.8505  |
| timm_models |                gernet_l                 |         0.9706         |  0.8499  |
| timm_models |         swsl_resnext101_32x16d          |         0.9786         |  0.8461  |
| timm_models |             coat_lite_mini              |         1.0202         |  0.8402  |
| timm_models |              botnet26t_256              |         0.9779         |  0.8239  |
| timm_models |                lcnet_050                |         0.884          |  0.805   |
| timm_models |                repvgg_a2                |         0.9611         |  0.7738  |
| timm_models |               regnety_002               |         0.8966         |  0.7602  |
| timm_models |             crossvit_9_240              |         0.9898         |  0.7526  |
| timm_models |      swin_base_patch4_window7_224       |         0.9045         |  0.7214  |
| timm_models |              jx_nest_base               |         0.9604         |  0.6693  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/memory_over_time.png :

bench_logs/passrate_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/comp_time_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Performance speedup regressions

+------------------------+------+-------------+------------+
|        compiler        | name | prev_status | cur_status |
+------------------------+------+-------------+------------+
| inductor_no_cudagraphs | drq  |   1.0504    |   0.9065   |
+------------------------+------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938

Performance speedup regressions

+----------+---------------------------------+-------------+------------+
| compiler |              name               | prev_status | cur_status |
+----------+---------------------------------+-------------+------------+
| inductor | deit_base_distilled_patch16_224 |   1.2524    |    0.0     |
+----------+---------------------------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+-------------+-------------+------------+
|        compiler        |    name     | prev_status | cur_status |
+------------------------+-------------+-------------+------------+
| inductor_no_cudagraphs | mnasnet_100 |  115.8768   |  120.0813  |
+------------------------+-------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9674 |   0.916   |  3.5837  |         1.3556         |
|           BERT_pytorch            |  16  | 0.9945 |  0.8006   |  3.0413  |         2.0941         |
|            hf_BigBird             |  2   | 0.9507 |   0.776   |  2.8293  |         1.6156         |
|            densenet121            |  4   | 0.989  |  0.7147   |  2.8107  |         1.0387         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9707 |  0.8945   |  2.6239  |         1.7593         |
|            hf_T5_large            |  2   | 0.9756 |  0.7993   |  2.3969  |         1.8481         |
|             hf_Albert             |  8   | 0.9954 |  0.9595   |  2.319   |         2.2813         |
|               dlrm                | 1024 | 0.9409 |  0.8474   |  2.0981  |         1.1805         |
|         phlippe_densenet          | 128  | 0.9844 |  0.7695   |  2.0642  |         1.0146         |
|        mobilenet_v3_large         |  32  | 0.995  |  0.7915   |  2.0478  |         1.197          |
|              hf_Bart              |  4   | 0.9657 |  0.7641   |  1.9718  |         1.3997         |
|               hf_T5               |  8   | 0.985  |  0.8494   |  1.8981  |         1.9397         |
|           squeezenet1_1           |  32  | 0.9802 |  0.9338   |  1.8838  |         1.2095         |
|          phlippe_resnet           | 128  | 0.9835 |  0.7596   |  1.8328  |         0.9702         |
|              hf_Bert              |  4   | 0.9983 |  0.8379   |  1.8016  |         1.5827         |
|              hf_GPT2              |  4   | 0.9961 |  0.9584   |  1.738   |         1.7714         |
|          resnext50_32x4d          |  8   | 0.9845 |  0.7168   |  1.699   |         0.9901         |
|      timm_vision_transformer      |  32  | 0.9871 |  0.8561   |  1.6798  |         1.3843         |
|           hf_GPT2_large           |  4   | 0.9826 |  0.9718   |  1.6558  |         1.7151         |
|            mnasnet1_0             |  32  | 0.9886 |  0.7295   |  1.6448  |         1.0253         |
|        speech_transformer         |  32  | 0.9791 |  0.8146   |  1.6198  |         1.6081         |
|        shufflenet_v2_x1_0         | 128  | 0.9951 |  0.7531   |  1.6175  |         1.181          |
|           hf_Bert_large           |  4   | 0.997  |  0.8728   |  1.5775  |         1.5505         |
|             resnet18              |  16  | 0.9868 |  0.7629   |  1.5755  |         0.9443         |
|           timm_resnest            |  32  | 0.9923 |   0.85    |  1.5597  |         1.5045         |
|           fastNLP_Bert            |  6   | 0.9813 |  0.8581   |  1.5269  |         1.5028         |
|           mobilenet_v2            |  96  | 0.9973 |  0.7778   |  1.5057  |         1.509          |
|            timm_nfnet             | 128  | 0.9869 |  0.9855   |  1.5032  |         1.4528         |
|          pytorch_struct           | 200  | 0.9181 |  0.7561   |  1.4914  |         1.1253         |
|                drq                |  1   | 0.969  |  0.7422   |  1.4893  |         0.9065         |
|           hf_DistilBert           |  8   | 0.981  |  0.9573   |  1.4778  |         1.4437         |
|           hf_Longformer           |  2   | 0.8266 |  0.5625   |  1.4771  |         1.279          |
| attention_is_all_you_need_pytorch | 256  | 0.9867 |  0.9119   |  1.4441  |         1.4341         |
|               dcgan               |  32  | 0.8563 |  0.6908   |  1.4349  |         0.8249         |
|         timm_efficientnet         |  32  | 0.9373 |  0.6217   |  1.4288  |         1.0676         |
|           pytorch_unet            |  1   | 0.9968 |  0.2049   |  1.376   |         1.3688         |
|          pytorch_stargan          |  16  | 0.9933 |  0.7797   |  1.3535  |         1.309          |
|           lennard_jones           | 1000 | 0.8267 |  0.7349   |  1.3074  |         0.8836         |
|          LearningToPaint          |  96  | 0.9876 |  0.7842   |  1.3067  |         1.0486         |
|               vgg16               |  64  | 0.9993 |  0.9984   |  1.2404  |         1.2523         |
|            Super_SloMo            |  6   | 0.9974 |  0.1789   |  1.2316  |         1.233          |
|             resnet152             |  32  | 0.9953 |  0.7643   |  1.2187  |         1.0272         |
|        Background_Matting         |  4   | 0.9993 |  0.1368   |  1.212   |         1.2082         |
|             resnet50              |  32  | 0.9946 |  0.7733   |  1.2062  |         1.0551         |
|         soft_actor_critic         | 256  | 0.8603 |  0.6423   |  1.1951  |         0.8245         |
|              yolov3               |  16  | 0.9964 |  0.8059   |  1.1887  |         1.1898         |
|            hf_Reformer            |  4   | 0.9859 |  0.9674   |  1.1413  |         1.0573         |
|              alexnet              | 128  | 0.9991 |  0.9975   |  1.0866  |         1.135          |
|              demucs               |  4   | 1.0005 |  1.0018   |  1.0349  |         1.0361         |
|            timm_regnet            |  32  | 0.9217 |  0.7707   |  1.003   |         0.971          |
|            tts_angular            |  64  | 0.9395 |  0.8988   |  0.9725  |         0.9658         |
|            timm_vovnet            |  32  | 0.8495 |  0.7166   |  0.9299  |         0.9208         |
|      nvidia_deeprecommender       | 256  | 0.9986 |  0.9984   |  0.873   |         1.0188         |
|   timm_vision_transformer_large   |  32  | 0.9984 |    0.0    |   0.0    |          0.0           |
|               moco                |  32  | 0.9781 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  |      0.0000      |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|         phlippe_densenet          | 128  | 3.2309  |  6.8436   | 165.025  |        166.042         |
|            hf_T5_large            |  2   | 27.555  |  55.6246  | 158.3562 |        155.443         |
|           hf_Longformer           |  2   | 11.5453 |  31.274   | 143.6472 |        112.0315        |
|         timm_efficientnet         |  32  | 4.9137  |  10.0138  | 142.4904 |        139.8204        |
|            hf_BigBird             |  2   | 12.8388 |  37.272   | 135.3274 |        114.5905        |
|            densenet121            |  4   | 7.6017  |  18.1644  | 130.2685 |        128.8748        |
|        mobilenet_v3_large         |  32  | 3.5282  |  7.6858   | 128.9372 |        134.2476        |
|           mobilenet_v2            |  96  | 3.1438  |  7.0586   | 122.6148 |        127.5357        |
|              yolov3               |  16  | 4.8781  |  10.7063  | 113.6191 |        114.0908        |
|            mnasnet1_0             |  32  | 3.1682  |  6.7362   | 105.1362 |        103.3029        |
|           hf_GPT2_large           |  4   | 14.8955 |   30.01   |  97.888  |        95.0648         |
|           timm_resnest            |  32  | 1.7871  |  3.8777   | 97.3082  |        99.6925         |
|             resnet152             |  32  |  9.069  |  20.1029  | 97.0561  |         95.85          |
|        shufflenet_v2_x1_0         | 128  |  3.466  |   7.694   | 78.5211  |        78.7053         |
|        speech_transformer         |  32  |  5.993  |  13.8826  | 75.5639  |        74.5487         |
| attention_is_all_you_need_pytorch | 256  |  4.341  |  10.8017  | 70.6708  |        70.4082         |
|            timm_nfnet             | 128  | 5.7034  |  11.0578  | 67.9839  |        67.7538         |
|            timm_regnet            |  32  | 6.5819  |  12.1186  | 67.7127  |         64.627         |
|        Background_Matting         |  4   | 3.2453  |  11.1679  | 65.7479  |        66.1687         |
|           BERT_pytorch            |  16  | 4.9757  |  11.6497  | 64.9536  |        64.0819         |
|             resnet50              |  32  | 3.1938  |   6.991   | 62.2559  |         61.052         |
|            timm_vovnet            |  32  | 3.6056  |  6.3199   | 60.8187  |        58.0203         |
|              hf_Bart              |  4   | 10.392  |  17.7953  | 58.1298  |         54.619         |
|           hf_Bert_large           |  4   | 10.4453 |  21.2737  | 58.0216  |        56.1467         |
|           pytorch_unet            |  1   |  1.531  |  4.4232   | 56.9688  |        57.3066         |
|       functorch_dp_cifar10        |  64  | 1.2054  |  2.3884   | 54.6547  |        53.5097         |
|          resnext50_32x4d          |  8   | 3.2141  |   6.89    | 51.4808  |        50.2567         |
|      timm_vision_transformer      |  32  | 3.3178  |  7.2038   | 50.4742  |        46.1693         |
|               hf_T5               |  8   | 5.6928  |  12.8263  | 46.2372  |        45.2037         |
|           fastNLP_Bert            |  6   | 5.1242  |  11.1844  | 45.8216  |        46.2459         |
|            hf_Reformer            |  4   | 4.1904  |  5.9401   | 44.0064  |        39.6378         |
|          LearningToPaint          |  96  | 1.4126  |  2.8393   | 43.7279  |        42.5718         |
|          pytorch_stargan          |  16  | 1.2181  |  3.2291   | 42.7598  |        42.7804         |
|             resnet18              |  16  | 1.3453  |  2.8618   | 42.7427  |        41.3239         |
|            Super_SloMo            |  6   | 2.7799  |  9.6651   | 39.1692  |        38.6622         |
|              hf_GPT2              |  4   | 4.7151  |  9.5953   | 38.7199  |        37.3409         |
|             hf_Albert             |  8   | 2.5531  |  8.0772   | 36.5586  |        36.8048         |
|              hf_Bert              |  4   | 5.0194  |   10.4    | 36.0778  |        33.4796         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2201  |  2.9868   | 33.7147  |        34.9925         |
|          phlippe_resnet           | 128  | 1.3498  |  2.8181   | 31.7642  |        30.6149         |
|              demucs               |  4   | 1.4437  |  2.1599   | 29.1304  |        28.6437         |
|           hf_DistilBert           |  8   | 2.3995  |  5.3127   | 28.2575  |        28.9882         |
|           squeezenet1_1           |  32  | 1.0532  |  1.7502   | 24.2753  |        23.3035         |
|          pytorch_struct           | 200  | 0.7457  |  1.3266   | 17.6253  |        19.3689         |
|              alexnet              | 128  | 0.4872  |  0.7797   | 15.0902  |         16.015         |
|               vgg16               |  64  | 0.6331  |  1.1077   | 14.6906  |        14.2736         |
|                drq                |  1   | 0.6711  |  1.0024   |  9.5325  |         9.4663         |
|      nvidia_deeprecommender       | 256  | 0.4799  |   0.758   |  9.3985  |         8.8883         |
|               dlrm                | 1024 | 0.3735  |  0.7731   |  7.5028  |         6.484          |
|               dcgan               |  32  | 0.4553  |  0.7054   |  7.0634  |         6.5808         |
|         soft_actor_critic         | 256  | 0.4293  |  0.6048   |  6.875   |         6.5741         |
|           lennard_jones           | 1000 |  0.394  |  0.6334   |  6.7126  |         5.9904         |
|            tts_angular            |  64  | 0.4428  |  0.5109   |  5.7559  |         5.094          |
|               moco                |  32  | 27.7989 |    nan    |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 9.3339  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.2082  |         1.2078         |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2037         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9868 |   0.765   |  1.0105  |         1.1025         |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  0.9852  |         0.9957         |
|            timm_nfnet             | 128  | 0.9068 |  0.8753   |  0.9693  |         1.0708         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.9425  |         1.026          |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9319  |         1.0718         |
|         timm_efficientnet         |  32  | 0.9847 |  0.7652   |  0.9293  |         1.0056         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8906  |         1.1284         |
|              yolov3               |  16  | 0.983  |  0.8252   |   0.87   |         1.0375         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.8682         |
|        shufflenet_v2_x1_0         | 128  | 0.9549 |  0.8383   |  0.8636  |         0.9653         |
|           timm_resnest            |  32  | 0.9885 |  0.8969   |  0.8604  |         0.9658         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9908 |  0.8529   |  0.8484  |         0.9536         |
|        Background_Matting         |  4   | 1.0125 |  0.6486   |  0.8484  |         1.0412         |
|             resnet152             |  32  | 0.994  |  0.8953   |  0.8479  |         0.9404         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9479         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|        mobilenet_v3_large         |  32  | 0.977  |  0.8728   |  0.7858  |         0.7768         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9925 |  0.8618   |  0.7813  |         0.8853         |
|              demucs               |  4   | 0.966  |  0.9664   |  0.7733  |         0.9662         |
|           squeezenet1_1           |  32  | 0.9683 |  0.9353   |  0.773   |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.7535  |         0.9285         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|            mnasnet1_0             |  32  | 0.9795 |  0.8618   |  0.7448  |         0.7749         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7274  |         0.7358         |
|               vgg16               |  64  | 0.9919 |  0.7243   |  0.7227  |         0.9805         |
|              alexnet              | 128  | 0.9455 |   0.793   |  0.7088  |         0.9385         |
|            densenet121            |  4   | 0.9944 |  0.9802   |  0.7061  |         0.8017         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6971  |         1.1068         |
|          resnext50_32x4d          |  8   | 0.9934 |  0.8416   |  0.666   |         0.7736         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8568   |  0.6065  |         0.6172         |
|          LearningToPaint          |  96  | 0.9192 |   0.711   |  0.5925  |         0.7458         |
|             resnet18              |  16  | 0.9751 |  0.7804   |  0.5423  |         0.6127         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|           hf_Longformer           |  2   | 0.8567 |  0.8296   |  0.417   |         0.8947         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |          nan           |
|               moco                |  32  | 0.9901 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.9144 | 214.8802  | 126.3363 |        122.0918        |
|        Background_Matting         |  4   | 126.0011 | 918.5538  | 103.8673 |        104.0337        |
|            hf_T5_large            |  2   | 228.8741 | 272.2865  | 102.0349 |        119.318         |
|               hf_T5               |  8   | 181.8784 | 210.9917  | 94.8335  |        92.6048         |
|           hf_Longformer           |  2   | 138.6011 | 202.2282  | 83.1057  |        96.5048         |
|            hf_BigBird             |  2   | 206.0881 | 249.9544  | 79.0046  |        121.5804        |
|            timm_nfnet             | 128  | 119.6071 | 119.4475  | 78.2414  |        81.4862         |
|            hf_Reformer            |  4   | 82.2334  |  83.7658  | 70.9394  |        76.6314         |
|            Super_SloMo            |  6   | 79.7574  | 443.4604  | 64.6672  |         64.405         |
|              yolov3               |  16  | 68.9437  |  84.922   | 57.7312  |        57.6727         |
|            timm_regnet            |  32  | 60.5594  |  72.6148  | 56.0164  |        57.6903         |
|               vgg16               |  64  | 66.3694  |  66.3113  | 53.5333  |        52.9084         |
|             resnet152             |  32  | 64.6095  |  83.3379  | 52.8418  |        64.4307         |
|           hf_Bert_large           |  4   | 82.7358  |  93.4215  | 52.0923  |        52.8848         |
|              demucs               |  4   | 53.7629  |  53.6625  | 52.0267  |        51.7124         |
|        speech_transformer         |  32  | 60.8453  |  86.903   | 38.2337  |         38.524         |
| attention_is_all_you_need_pytorch | 256  | 54.1561  |  59.0605  | 37.2899  |         37.545         |
|              hf_Bart              |  4   | 60.2651  |  77.6782  | 35.4901  |        40.8509         |
|           fastNLP_Bert            |  6   | 52.7146  |  60.8788  | 33.7639  |        35.6082         |
|           mobilenet_v2            |  96  | 47.2223  |  60.3534  | 31.2721  |        31.2135         |
|             hf_Albert             |  8   | 69.8687  |  72.3417  | 29.8965  |        30.4021         |
|           pytorch_unet            |  1   | 40.0422  | 194.4006  | 28.9641  |        29.0822         |
|              hf_GPT2              |  4   | 50.0373  |  50.3299  | 27.8616  |        27.5158         |
|            timm_vovnet            |  32  | 28.8846  |  34.8687  | 26.2316  |        26.7478         |
|              hf_Bert              |  4   | 40.4764  |  46.9967  | 22.7979  |        25.3773         |
|         timm_efficientnet         |  32  | 33.3471  |   50.47   | 22.2003  |         29.913         |
|             resnet50              |  32  | 26.5637  |  34.0731  | 22.0977  |        25.2727         |
|           hf_DistilBert           |  8   | 32.0633  |  32.8441  | 22.0437  |        21.7726         |
|        shufflenet_v2_x1_0         | 128  | 30.8642  |  40.5303  |  18.966  |        25.9005         |
|            densenet121            |  4   | 54.7122  |  75.2731  | 18.8995  |        51.5488         |
|      timm_vision_transformer      |  32  | 29.1869  |  33.2364  | 18.4072  |        20.4181         |
|           BERT_pytorch            |  16  | 57.0193  |  67.2582  | 17.7659  |        25.3631         |
|           timm_resnest            |  32  | 24.2775  |  28.3252  | 15.4056  |        16.0497         |
|            mnasnet1_0             |  32  |  22.51   |  30.1134  | 14.1713  |        22.7173         |
|        mobilenet_v3_large         |  32  | 27.1547  |  35.2245  | 13.9408  |        22.0284         |
|      nvidia_deeprecommender       | 256  |  10.248  |  10.2509  |  11.722  |        10.0474         |
|          pytorch_stargan          |  16  | 14.9695  |  19.4924  | 11.6535  |        12.1043         |
|          resnext50_32x4d          |  8   | 20.6449  |  27.8524  | 11.6074  |        19.9544         |
|         phlippe_densenet          | 128  | 23.4262  |  29.346   | 11.4625  |        22.6408         |
|              alexnet              | 128  |  9.8299  |   9.856   |  9.0439  |         8.6595         |
|          LearningToPaint          |  96  | 11.4776  |  14.0478  |  8.6126  |        10.7015         |
|            tts_angular            |  64  |  6.5983  |  6.8985   |  6.5056  |         6.3867         |
|             resnet18              |  16  |  9.3397  |  12.0564  |  6.2123  |        10.2667         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.4352  |  15.5699  |  5.8079  |         7.5903         |
|           squeezenet1_1           |  32  | 10.5767  |  11.3462  |  5.4488  |         9.6059         |
|          phlippe_resnet           | 128  |  9.0682  |  11.6893  |  4.9826  |        10.2396         |
|          pytorch_struct           | 200  |  5.0317  |  5.9711   |  3.2165  |         4.7336         |
|       functorch_dp_cifar10        |  64  | 10.5693  |  10.9148  |  2.8792  |         7.4659         |
|                drq                |  1   |  3.3221  |  4.2998   |  2.1952  |         4.1535         |
|               dlrm                | 1024 |  4.3937  |  4.8327   |  2.1352  |         3.4918         |
|           lennard_jones           | 1000 |  1.7893  |  2.1297   |  1.9981  |         1.7797         |
|               dcgan               |  32  |  2.4067  |   3.047   |  1.4634  |         2.4884         |
|         soft_actor_critic         | 256  |  1.925   |  2.4359   |  1.3507  |         1.9012         |
|   timm_vision_transformer_large   |  32  | 464.6477 |    nan    |   nan    |          nan           |
|               moco                |  32  | 51.5626  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9898 |   0.93    |  2.4331  |         2.4913         |
|          MobileBertForMaskedLM          | 64  | 0.9545 |  0.8212   |  2.3314  |         1.0689         |
|      GPT2ForSequenceClassification      |  4  | 0.9776 |  0.9513   |  2.2393  |         2.2698         |
|       ElectraForQuestionAnswering       | 64  | 0.9867 |  0.9791   |  2.1134  |         2.0863         |
|       MT5ForConditionalGeneration       | 16  | 0.9905 |  0.8446   |  2.0855  |         1.8524         |
|     MobileBertForQuestionAnswering      | 128 | 0.947  |  0.8063   |  2.0839  |         1.0668         |
|             XGLMForCausalLM             |  8  | 0.9357 |  0.7335   |  2.0178  |         1.1997         |
|            XLNetLMHeadModel             |  8  | 0.9957 |  0.9657   |  1.812   |         1.8121         |
|           ElectraForCausalLM            | 32  | 0.9818 |  0.9349   |  1.7997  |         1.8331         |
|    LayoutLMForSequenceClassification    | 16  | 0.9845 |  0.9711   |  1.7932  |         1.779          |
|        BertForQuestionAnswering         | 16  | 0.9843 |  0.9697   |  1.766   |         1.7517         |
|       RobertaForQuestionAnswering       | 16  | 0.9844 |  0.9698   |  1.7613  |         1.7581         |
|           RobertaForCausalLM            | 16  | 0.9869 |  0.9626   |  1.6677  |         1.6656         |
|       AlbertForQuestionAnswering        |  4  | 0.9998 |  0.8857   |  1.6538  |         1.6463         |
|               DistillGPT2               | 16  | 0.9879 |  0.9559   |  1.6506  |         1.6933         |
|            AlbertForMaskedLM            |  4  | 0.9999 |  0.8851   |  1.645   |         1.6387         |
|     M2M100ForConditionalGeneration      | 16  | 1.0232 |   0.805   |  1.6445  |         1.3689         |
|       T5ForConditionalGeneration        |  4  | 0.9791 |  0.8492   |  1.6276  |         1.7225         |
|            PLBartForCausalLM            |  8  | 0.9894 |   0.961   |  1.6244  |         1.6696         |
|                 T5Small                 |  4  | 0.979  |  0.8471   |  1.623   |         1.7319         |
|     PLBartForConditionalGeneration      |  4  | 0.9892 |  0.9495   |  1.6164  |         1.6346         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9809 |  0.9611   |  1.6042  |         1.6269         |
|             BertForMaskedLM             | 16  | 0.9862 |  0.9608   |  1.5977  |         1.5867         |
|          AllenaiLongformerBase          |  4  | 0.8845 |  0.6243   |  1.5867  |         1.4985         |
|           LayoutLMForMaskedLM           | 16  | 0.9864 |  0.9623   |  1.5739  |         1.5994         |
|                CamemBert                | 16  | 0.987  |   0.963   |  1.5438  |         1.5326         |
|            MBartForCausalLM             |  4  | 0.9843 |  0.9604   |  1.4893  |         1.5258         |
|            YituTechConvBert             | 16  | 0.9858 |  0.9558   |  1.4891  |         1.4928         |
|             BartForCausalLM             |  4  | 0.985  |   0.964   |  1.4876  |         1.5389         |
|         Speech2Text2ForCausalLM         | 256 | 0.9751 |  0.9276   |  1.4716  |         1.5416         |
|      BartForConditionalGeneration       |  2  | 0.9977 |  0.9429   |  1.4551  |         1.447          |
|         MegatronBertForCausalLM         |  4  | 0.984  |    0.9    |  1.4513  |         1.5014         |
|     DistilBertForQuestionAnswering      | 256 | 0.9939 |   0.987   |  1.4407  |         1.4405         |
|      MBartForConditionalGeneration      |  2  | 0.9971 |   0.943   |  1.4382  |         1.5188         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9964 |  0.9073   |  1.3842  |         1.4247         |
|     PegasusForConditionalGeneration     | 32  | 0.9969 |  0.9164   |  1.3403  |         1.2744         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9581 |  0.8902   |  1.2779  |         1.1741         |
|           PegasusForCausalLM            | 32  | 0.9489 |    0.9    |  1.2414  |         1.1734         |
|            TrOCRForCausalLM             | 32  |  0.99  |  0.9619   |  1.2412  |         1.2852         |
|          DistilBertForMaskedLM          | 128 | 0.9926 |  0.9505   |  1.2189  |         1.245          |
|       DebertaForQuestionAnswering       |  8  | 0.8002 |   0.698   |  1.0247  |         0.9164         |
|           DebertaForMaskedLM            |  4  | 0.7205 |  0.5759   |  0.903   |          0.76          |
|          DebertaV2ForMaskedLM           |  1  | 0.6912 |  0.5143   |  0.8475  |         0.6187         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.699  |  0.5196   |  0.7808  |         0.6271         |
|          BlenderbotForCausalLM          |  4  | 0.8917 |  0.7422   |   0.0    |         1.101          |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          AllenaiLongformerBase          |  4  | 11.6747 |  31.8976  | 142.2876 |        111.0905        |
|          MobileBertForMaskedLM          | 64  | 17.2166 |  40.8803  | 130.8579 |        133.427         |
|       MT5ForConditionalGeneration       | 16  | 8.0924  |  18.8248  | 125.2247 |        125.9944        |
|     MobileBertForQuestionAnswering      | 128 | 17.1125 |  40.838   | 124.7928 |        127.0823        |
|          DebertaV2ForMaskedLM           |  1  | 15.5953 |  26.8595  | 124.4777 |        61.0243         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.3175 |  26.6758  | 123.6107 |        58.5909         |
|     M2M100ForConditionalGeneration      | 16  | 11.9448 |  25.8755  |  103.92  |        100.6376        |
|            XLNetLMHeadModel             |  8  | 10.6587 |  27.8154  | 83.1612  |        82.9817         |
|           DebertaForMaskedLM            |  4  | 7.3773  |  13.849   | 77.6941  |        49.0968         |
|       DebertaForQuestionAnswering       |  8  | 7.1818  |  13.1771  | 72.5836  |        45.9379         |
|             XGLMForCausalLM             |  8  | 9.7092  |  20.9609  | 71.6496  |        64.7148         |
|            YituTechConvBert             | 16  | 11.1686 |  19.8487  | 71.0063  |        68.7459         |
|      MBartForConditionalGeneration      |  2  | 11.9533 |  26.3952  | 70.4773  |        70.1045         |
|     PegasusForConditionalGeneration     | 32  |  5.216  |  19.4463  | 67.0182  |        64.1267         |
|      BartForConditionalGeneration       |  2  | 11.7353 |  26.2579  | 65.6862  |        66.1212         |
|           ElectraForCausalLM            | 32  | 7.5984  |  13.6823  | 63.3152  |        57.5933         |
|    MegatronBertForQuestionAnswering     |  8  | 10.2872 |  21.6482  |  58.587  |        59.6034         |
|         MegatronBertForCausalLM         |  4  | 10.3353 |  23.1101  | 58.2176  |        59.2643         |
|     PLBartForConditionalGeneration      |  4  | 9.3184  |  16.7971  | 55.2095  |        52.7063         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7706  |  17.2075  | 49.7484  |        49.1177         |
|       T5ForConditionalGeneration        |  4  | 5.8092  |  13.0519  | 46.0203  |        44.8232         |
|             BartForCausalLM             |  4  | 6.3803  |  12.1769  |  45.824  |        40.5387         |
|                 T5Small                 |  4  | 5.8531  |  12.8348  | 45.2878  |        44.3991         |
|            MBartForCausalLM             |  4  | 6.4335  |  12.1656  | 44.9941  |        40.4298         |
|    LayoutLMForSequenceClassification    | 16  | 5.5388  |  11.3274  | 42.9081  |        42.6139         |
|           PegasusForCausalLM            | 32  |  5.887  |  11.4219  | 42.6191  |        38.5709         |
|            TrOCRForCausalLM             | 32  | 6.4066  |  11.7246  | 42.3401  |         38.565         |
|       ElectraForQuestionAnswering       | 64  | 5.2643  |  10.971   | 41.2626  |        38.2059         |
|             OPTForCausalLM              |  2  | 5.3906  |  11.0349  | 39.0351  |        37.2155         |
|           LayoutLMForMaskedLM           | 16  | 5.6688  |  11.4303  | 37.9502  |        37.3139         |
|        BertForQuestionAnswering         | 16  | 5.1882  |  10.7404  | 35.8089  |        33.7608         |
|       BlenderbotSmallForCausalLM        | 64  | 4.8382  |  8.2925   | 35.3464  |        33.3611         |
|     DistilBertForQuestionAnswering      | 256 | 2.5194  |  5.5016   | 34.0836  |        32.5173         |
|             BertForMaskedLM             | 16  | 5.2171  |  10.6245  | 33.9699  |        35.8007         |
|            AlbertForMaskedLM            |  4  | 2.4014  |  8.1221   | 33.7256  |         34.703         |
|      GPT2ForSequenceClassification      |  4  | 4.8275  |  9.8963   | 33.2584  |         29.887         |
|          DistilBertForMaskedLM          | 128 | 2.5077  |  5.4703   | 32.4074  |        30.4016         |
|           RobertaForCausalLM            | 16  | 5.5103  |  10.8004  |  32.202  |        32.3275         |
|                CamemBert                | 16  | 5.2596  |  10.7921  | 32.0523  |        32.4626         |
|         Speech2Text2ForCausalLM         | 256 | 3.4459  |  6.1983   | 31.9137  |        29.4705         |
|       RobertaForQuestionAnswering       | 16  | 5.4103  |  10.7604  | 31.1678  |        31.0377         |
|            PLBartForCausalLM            |  8  | 3.7501  |  6.6699   | 30.8224  |        30.9451         |
|       AlbertForQuestionAnswering        |  4  | 2.4327  |  8.1563   | 30.0816  |        31.6141         |
|               DistillGPT2               | 16  | 2.5467  |  5.0484   | 27.2617  |        26.8862         |
|          BlenderbotForCausalLM          |  4  | 11.6707 |  22.2125  |   nan    |        63.8955         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9252   |  1.062   |         1.1099         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|            YituTechConvBert             | 16  | 0.953  |  0.8749   |  0.9793  |         0.9905         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8787   |  0.9563  |         0.9847         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0019         |
|            PLBartForCausalLM            |  8  | 0.9237 |  0.8182   |  0.8907  |         0.9249         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8966   |  0.8901  |         1.0074         |
|           ElectraForCausalLM            | 32  | 0.9161 |   0.786   |  0.889   |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|            TrOCRForCausalLM             | 32  |  0.92  |   0.829   |  0.8619  |         0.9075         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8924   |  0.8491  |         0.9507         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|             BartForCausalLM             |  4  | 0.951  |  0.8923   |  0.8301  |         0.943          |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.8065  |         0.8318         |
|           PegasusForCausalLM            | 32  | 0.9257 |  0.8421   |  0.7952  |         0.9252         |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7573   |  0.7566  |         0.808          |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |  0.6744  |         0.9287         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.6058  |         0.8978         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9763   |  0.487   |         0.9801         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |  0.4688  |         0.8742         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1527         |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |   nan    |         0.9941         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.2496 | 300.9057  | 161.9709 |        162.612         |
|       AlbertForQuestionAnswering        |  4  | 264.1094 | 298.4606  | 159.7968 |        160.5892        |
|            XLNetLMHeadModel             |  8  | 280.4674 | 289.2073  |  153.35  |        153.4832        |
|      DebertaV2ForQuestionAnswering      |  2  | 151.9736 | 200.6198  | 136.0912 |        167.926         |
|          DebertaV2ForMaskedLM           |  1  | 149.7741 | 198.5937  | 125.1963 |        165.7273        |
|          AllenaiLongformerBase          |  4  | 205.0013 | 291.3629  | 114.6671 |        120.7557        |
|     PegasusForConditionalGeneration     | 32  | 158.7442 | 153.7066  | 111.8572 |        115.9786        |
|            TrOCRForCausalLM             | 32  | 138.9186 | 142.6848  | 111.3102 |        107.4753        |
|      MBartForConditionalGeneration      |  2  | 150.4806 | 157.1575  | 95.3542  |        94.2738         |
|      BartForConditionalGeneration       |  2  | 151.4788 | 149.4591  | 94.6392  |        99.1291         |
|    MegatronBertForQuestionAnswering     |  8  | 144.8941 | 147.6544  | 88.5402  |        87.4725         |
|            YituTechConvBert             | 16  | 128.054  | 131.0605  | 84.0386  |        83.8937         |
| BlenderbotSmallForConditionalGeneration | 64  | 123.6485 | 124.5591  | 81.1954  |        85.9262         |
|     MobileBertForQuestionAnswering      | 128 | 177.2369 | 219.3812  | 80.7308  |        162.5293        |
|                CamemBert                | 16  | 120.0042 | 122.9204  | 76.7266  |        77.1949         |
|             BartForCausalLM             |  4  | 115.9931 |  117.381  | 76.4478  |        74.3157         |
|            MBartForCausalLM             |  4  | 116.3355 | 119.2509  | 76.2776  |        75.0468         |
|       DebertaForQuestionAnswering       |  8  | 94.8957  | 108.6113  | 74.1736  |        82.5463         |
|     PLBartForConditionalGeneration      |  4  | 119.0924 | 122.5947  | 73.6253  |        72.8636         |
|     M2M100ForConditionalGeneration      | 16  | 146.8311 | 142.1294  | 73.3374  |        98.3083         |
|          MobileBertForMaskedLM          | 64  | 211.0618 | 217.2681  | 72.0692  |        159.8619        |
|     DistilBertForQuestionAnswering      | 256 | 103.7444 | 104.6045  | 71.7539  |         71.653         |
|           LayoutLMForMaskedLM           | 16  | 114.0971 | 117.1138  |  71.612  |        70.4501         |
|            PLBartForCausalLM            |  8  | 118.2022 | 117.2029  | 71.4088  |        69.4157         |
|             OPTForCausalLM              |  2  | 170.1909 | 180.2379  | 70.1588  |        68.6186         |
|          DistilBertForMaskedLM          | 128 | 85.2491  |  89.0351  | 69.4716  |        67.8708         |
|           RobertaForCausalLM            | 16  | 116.9747 | 119.3777  | 69.0313  |        69.1381         |
|             BertForMaskedLM             | 16  | 111.5916 | 114.2919  | 68.7551  |        69.4087         |
|           DebertaForMaskedLM            |  4  | 85.6529  | 121.5818  | 68.1289  |        80.0392         |
|                 T5Small                 |  4  | 107.124  | 123.2636  | 64.4779  |        60.2077         |
|       T5ForConditionalGeneration        |  4  | 107.0397 | 122.9398  | 64.4461  |        60.4573         |
|               DistillGPT2               | 16  | 107.1274 | 110.5877  | 64.1178  |        62.4732         |
|           PegasusForCausalLM            | 32  | 78.5461  |  76.876   | 59.7705  |        59.4215         |
|         MegatronBertForCausalLM         |  4  | 88.5781  | 105.3016  | 59.3704  |        58.3757         |
|             XGLMForCausalLM             |  8  | 125.6739 | 119.8127  | 55.0785  |        96.0988         |
|    LayoutLMForSequenceClassification    | 16  | 99.1984  | 100.6883  | 54.5616  |         54.979         |
|       RobertaForQuestionAnswering       | 16  | 97.2026  |  98.565   | 54.2094  |        54.3591         |
|       ElectraForQuestionAnswering       | 64  | 116.2502 | 118.4474  | 54.2018  |        54.9257         |
|        BertForQuestionAnswering         | 16  | 96.7604  |  98.0442  | 53.9676  |        54.3071         |
|           ElectraForCausalLM            | 32  | 89.7512  |  94.0761  | 48.9867  |        48.0381         |
|       BlenderbotSmallForCausalLM        | 64  | 68.7866  |  64.9484  |  48.076  |        52.4086         |
|       MT5ForConditionalGeneration       | 16  | 93.9451  | 109.6471  | 44.0535  |        50.1119         |
|      GPT2ForSequenceClassification      |  4  | 93.7641  |  96.0935  |  40.815  |        40.2193         |
|         Speech2Text2ForCausalLM         | 256 | 55.3825  |  56.059   |  36.434  |        34.8948         |
|          BlenderbotForCausalLM          |  4  | 121.0251 | 144.2292  |   nan    |        106.8335        |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9978 |  0.9973   |  3.0143  |         2.9775         |
|         coat_lite_mini          | 128 | 0.9972 |  0.9957   |  1.939   |         1.9101         |
|      xcit_large_24_p8_224       |  5  | 0.989  |  0.8631   |  1.9361  |         1.5725         |
|        twins_pcpvt_base         | 64  | 0.9976 |  0.9151   |  1.9193  |         1.6824         |
|          ghostnet_100           | 128 | 0.992  |  0.7645   |  1.8271  |         1.6057         |
|          gmlp_s16_224           | 128 | 0.9946 |  1.0828   |  1.798   |         1.7822         |
|          gmixer_24_224          | 128 | 0.9948 |  0.8891   |  1.7406  |         1.7281         |
|           volo_d1_224           | 64  | 0.9939 |  0.9731   |  1.6849  |         1.6641         |
|            lcnet_050            | 128 | 0.9405 |  0.7344   |  1.6847  |         1.4285         |
|         crossvit_9_240          | 128 | 0.9909 |  0.7828   |  1.6212  |         1.5999         |
|  swin_base_patch4_window7_224   | 64  | 0.9905 |  0.9428   |  1.6091  |          1.6           |
|           convit_base           | 64  | 0.9983 |  0.9974   |  1.5517  |         1.5524         |
|             dla102              | 128 | 0.9958 |  0.8153   |  1.5263  |         1.5225         |
|       gluon_inception_v3        | 128 | 0.9964 |   0.865   |  1.5098  |         1.4968         |
|          inception_v3           | 128 | 0.9963 |  0.8645   |  1.5079  |         1.4952         |
|        adv_inception_v3         | 128 | 0.9967 |  0.8612   |  1.5075  |         1.4999         |
|        sebotnet33ts_256         | 64  | 0.957  |  0.7647   |  1.4987  |         1.5266         |
|          convnext_base          | 64  | 0.9841 |   0.984   |  1.485   |         1.4703         |
|            nfnet_l0             | 128 | 0.9896 |  0.8137   |  1.4809  |         1.4304         |
|           dm_nfnet_f0           | 128 | 0.9865 |  0.9851   |  1.4548  |         1.4111         |
|            pit_b_224            | 64  | 0.9948 |  0.9924   |  1.4303  |         1.4241         |
|       eca_botnext26ts_256       | 128 | 0.973  |  0.7192   |  1.4242  |         1.411          |
|           mnasnet_100           | 128 | 0.9479 |  0.7397   |  1.4218  |         1.4855         |
|           mobilevit_s           | 64  | 0.9619 |  0.7314   |  1.4207  |         1.4347         |
|      mobilenetv3_large_100      | 128 | 0.9502 |  0.7603   |  1.419   |         1.4073         |
|           resnest101e           | 64  | 0.995  |  0.8655   |  1.4093  |         1.3419         |
|           selecsls42b           | 128 | 0.9984 |   0.812   |  1.407   |         1.4073         |
|           regnety_002           | 128 | 0.9477 |  0.7089   |  1.4009  |         1.212          |
|          botnet26t_256          | 128 | 0.9734 |  0.8503   |  1.3906  |         1.406          |
|        res2net50_14w_8s         | 128 | 0.9987 |   0.791   |  1.3798  |         1.3568         |
|         mobilenetv2_100         | 128 | 0.9495 |   0.737   |  1.3782  |         1.4337         |
|          jx_nest_base           | 32  | 0.987  |  0.9844   |  1.3691  |         1.3607         |
|           res2next50            | 128 | 0.9986 |  0.8252   |  1.3683  |         1.362          |
|          cait_m36_384           |  4  | 0.9947 |  0.9933   |  1.3641  |         1.3413         |
|          mixer_b16_224          | 128 | 0.9971 |  1.0184   |  1.3626  |         1.3594         |
|            hrnet_w18            | 128 | 0.9926 |  0.6439   |  1.3591  |         1.3421         |
|       tf_efficientnet_b0        | 128 | 0.9598 |  0.6812   |  1.357   |         1.3877         |
|          spnasnet_100           | 128 | 0.9416 |  0.7383   |  1.3438  |         1.4116         |
|      beit_base_patch16_224      | 64  | 0.9969 |  0.9599   |  1.3428  |         1.3427         |
|           fbnetc_100            | 128 | 0.9498 |  0.7393   |  1.3403  |         1.3937         |
|        ese_vovnet19b_dw         | 128 | 0.958  |  0.8325   |  1.3366  |         1.3579         |
|         poolformer_m36          | 64  | 0.9866 |  0.9832   |  1.3261  |         1.3181         |
|            fbnetv3_b            | 128 | 0.9497 |  0.7693   |  1.2934  |         1.3185         |
|           rexnet_100            | 128 | 0.9512 |  0.7028   |  1.2855  |         1.3248         |
|          resmlp_12_224          | 128 | 0.9931 |  0.8885   |  1.2515  |         1.2494         |
|      vit_base_patch16_224       | 64  | 0.9964 |  0.9938   |  1.2325  |         1.2327         |
|            tinynet_a            | 128 | 0.9472 |  0.6784   |  1.2241  |         1.2588         |
|          cspdarknet53           | 64  | 0.9333 |  0.7843   |  1.207   |         1.2457         |
|           tf_mixnet_l           | 128 | 0.9765 |  0.8266   |  1.182   |         1.1887         |
|         visformer_small         | 128 | 0.996  |  0.9449   |  1.174   |         1.166          |
|            mixnet_l             | 128 | 0.9766 |  0.8214   |  1.1725  |         1.179          |
|        res2net101_26w_4s        | 64  | 1.0002 |  0.7953   |  1.148   |         1.0902         |
|             dpn107              | 32  | 0.9311 |  0.8069   |  1.0945  |         1.1389         |
|        gluon_xception65         | 32  | 0.9922 |  0.8425   |  1.0711  |         1.0746         |
|            repvgg_a2            | 128 | 0.9347 |  0.7551   |  1.0632  |         1.0952         |
|     swsl_resnext101_32x16d      | 32  | 0.9976 |  0.8397   |  1.0612  |         1.0255         |
|            gernet_l             | 128 | 0.9362 |  0.7925   |  1.0322  |         1.0621         |
|        convmixer_768_32         | 32  | 0.9985 |  0.9637   |  1.0022  |         1.003          |
|          pnasnet5large          | 16  | 0.9862 |  0.9123   |  0.9101  |         0.9204         |
| deit_base_distilled_patch16_224 | 64  | 0.9963 |  0.9938   |   0.0    |          0.0           |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+-------------+------------------------+
|              name               | bs | eager |   aot_eager   |  inductor   | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+-------------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |    pass     |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |    pass     |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |    pass     |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |    pass     |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |    pass     |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |    pass     |          pass          |
|           regnety_002           | 8  | pass  |     pass      |    pass     |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |    pass     |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |    pass     |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |    pass     |          pass          |
|           res2next50            | 8  | pass  |     pass      |    pass     |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |    pass     |          pass          |
|           resnest101e           | 8  | pass  |     pass      |    pass     |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |    pass     |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |    pass     |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |    pass     |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |    pass     |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |    pass     |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |    pass     |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |    pass     |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |    pass     |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |    pass     |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |    pass     |          pass          |
|         visformer_small         | 8  | pass  |     pass      |    pass     |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |    pass     |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |    pass     |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |    pass     |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |    pass     |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |    pass     |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |    pass     |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |    pass     |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |    pass     |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |    pass     |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |    pass     |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |    pass     |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |    pass     |          pass          |
|           convit_base           | 8  | pass  |     pass      |    pass     |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |    pass     |          pass          |
|          convnext_base          | 8  | pass  |     pass      |    pass     |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |    pass     |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |    pass     |          pass          |
|             dla102              | 8  | pass  |     pass      |    pass     |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |    pass     |          pass          |
|             dpn107              | 8  | pass  |     pass      |    pass     |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |    pass     |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |    pass     |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |    pass     |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |    pass     |          pass          |
|            gernet_l             | 8  | pass  |     pass      |    pass     |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |    pass     |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |    pass     |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |    pass     |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |    pass     |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |    pass     |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |    pass     |          pass          |
|          inception_v3           | 8  | pass  |     pass      |    pass     |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |    pass     |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |    pass     |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |    pass     |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      | fail_to_run |          pass          |
+---------------------------------+----+-------+---------------+-------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.6819  |  11.1937  | 279.2158 |        289.3288        |
|            hrnet_w18            | 128 | 9.6398  |  36.3476  | 234.5558 |        227.7826        |
|          ghostnet_100           | 128 | 7.5214  |  15.0263  | 234.0273 |         237.8          |
|            fbnetv3_b            | 128 | 8.3414  |  17.1366  | 164.4178 |        167.8318        |
|           mobilevit_s           | 64  | 5.4605  |  11.4386  | 160.2726 |        152.8017        |
|            tinynet_a            | 128 |  6.02   |  12.2853  | 158.1586 |        158.6931        |
|            mixnet_l             | 128 | 8.6127  |  16.0797  | 157.4954 |        152.0009        |
|           resnest101e           | 64  | 11.2483 |  24.4613  | 157.4631 |        155.6461        |
|       gluon_inception_v3        | 128 | 5.6822  |  12.5861  | 157.2453 |        153.1525        |
|          inception_v3           | 128 | 5.5852  |  12.5865  | 156.3229 |        154.5637        |
|           tf_mixnet_l           | 128 | 9.1931  |  16.9678  | 155.7678 |        156.0582        |
|        adv_inception_v3         | 128 | 5.5904  |  12.7158  | 155.0713 |        153.7598        |
|      mobilenetv3_large_100      | 128 | 4.2175  |  8.3927   | 153.7169 |        151.4939        |
|       tf_efficientnet_b0        | 128 | 5.1736  |  10.4317  | 152.9053 |        146.3716        |
|          pnasnet5large          | 16  | 8.2744  |  26.1652  | 152.7828 |        150.2401        |
|        res2net101_26w_4s        | 64  | 10.7069 |  24.8524  | 142.6738 |        140.4152        |
|        twins_pcpvt_base         | 64  | 10.7288 |  23.7205  | 138.6304 |        139.4327        |
|           fbnetc_100            | 128 | 5.0529  |  9.6525   | 137.668  |        133.3645        |
|          spnasnet_100           | 128 | 5.0048  |  9.4918   | 133.0916 |        133.1085        |
|         mobilenetv2_100         | 128 | 4.1918  |  7.8764   | 130.9183 |        125.3272        |
|      xcit_large_24_p8_224       |  5  | 12.7607 |  28.5241  | 121.5096 |        118.7053        |
|           mnasnet_100           | 128 | 4.0205  |  7.6179   | 118.447  |        120.0813        |
|        res2net50_14w_8s         | 128 | 9.1123  |  22.6881  | 114.0407 |        113.1868        |
|          cait_m36_384           |  4  | 13.6487 |  31.2258  | 106.7914 |        102.7942        |
|        sebotnet33ts_256         | 64  | 4.2145  |  8.9101   | 103.3964 |        104.9672        |
|           regnety_002           | 128 | 4.9146  |  8.8179   | 102.9245 |        100.7885        |
|  swin_base_patch4_window7_224   | 64  | 8.7229  |  19.6874  | 100.1217 |        99.0548         |
|            lcnet_050            | 128 | 2.5308  |  5.0427   | 95.4799  |        97.4385         |
|         poolformer_m36          | 64  | 7.7293  |  13.8872  | 94.4204  |        92.9924         |
|          cspdarknet53           | 64  | 5.7373  |  11.0526  | 93.3868  |        94.0839         |
|       eca_botnext26ts_256       | 128 | 3.0875  |   6.869   | 93.3012  |        94.1856         |
|             dla102              | 128 | 6.2025  |  14.322   | 92.9196  |        91.8124         |
|             dpn107              | 32  | 9.8701  |  19.8194  | 92.4363  |        92.6135         |
|          botnet26t_256          | 128 | 2.9093  |  6.0593   | 89.3439  |         88.182         |
|        gluon_xception65         | 32  | 7.7845  |  16.8421  | 88.8289  |        86.9279         |
|           selecsls42b           | 128 | 2.4858  |  5.7378   |  87.246  |        90.4643         |
|         coat_lite_mini          | 128 | 3.2756  |  7.9476   | 86.8864  |        86.1974         |
|           res2next50            | 128 | 5.1278  |  12.1776  | 85.5826  |        80.3849         |
|         crossvit_9_240          | 128 | 5.6901  |  13.5852  | 83.4474  |        81.7638         |
|            gernet_l             | 128 | 4.9481  |  8.9388   | 79.7938  |        76.9299         |
|          jx_nest_base           | 32  | 6.7045  |  15.0619  | 77.9722  |        76.7433         |
|        ese_vovnet19b_dw         | 128 |  2.57   |  4.6143   | 75.8287  |        76.2068         |
|            nfnet_l0             | 128 | 5.3173  |  10.9258  | 73.5205  |        73.1569         |
|           dm_nfnet_f0           | 128 | 6.2326  |  11.5179  | 69.6421  |        66.3511         |
|           volo_d1_224           | 64  | 5.1374  |  11.8983  | 68.3324  |        67.4394         |
|         visformer_small         | 128 |  2.615  |  6.0806   | 63.1924  |        64.4177         |
|        tnt_s_patch16_224        | 128 | 6.5642  |  16.1299  | 62.0113  |        61.9781         |
|            repvgg_a2            | 128 | 4.9085  |  8.7957   | 56.9201  |        58.3754         |
|     swsl_resnext101_32x16d      | 32  | 6.2435  |  13.6778  |  55.897  |        54.1133         |
|          convnext_base          | 64  | 6.7656  |  12.8282  | 54.7614  |        53.6518         |
|          gmlp_s16_224           | 128 | 5.5666  |  12.042   | 53.2846  |        56.5677         |
|           convit_base           | 64  | 3.5023  |   8.754   | 45.9383  |        45.2225         |
|          gmixer_24_224          | 128 | 5.7946  |  12.8609  | 44.7255  |        46.0785         |
|            pit_b_224            | 64  | 3.4442  |  8.1542   | 41.6145  |        41.5264         |
|          resmlp_12_224          | 128 | 2.8228  |  5.4886   | 37.0791  |        37.1216         |
|      vit_base_patch16_224       | 64  | 3.0941  |   7.07    | 35.6546  |        35.6531         |
|        convmixer_768_32         | 32  |  1.689  |   6.973   | 33.1709  |         32.704         |
|      beit_base_patch16_224      | 64  | 3.8686  |   8.917   | 32.4298  |        30.6071         |
|          mixer_b16_224          | 128 | 2.6826  |  6.4046   | 29.3199  |        31.8185         |
| deit_base_distilled_patch16_224 | 64  | 3.1332  |  7.2477   |   nan    |          nan           |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9634 |  0.9151   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |   nan    |          nan           |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.6542 | 312.8372  | 300.1999 |        299.9559        |
|          pnasnet5large          | 16  | 199.2254 | 215.2063  | 217.2185 |        214.375         |
|            hrnet_w18            | 128 | 282.478  | 435.3452  | 205.4132 |        208.7185        |
|           tf_mixnet_l           | 128 | 194.1509 |  229.976  | 160.2855 |        159.6253        |
|            mixnet_l             | 128 | 186.2533 | 220.9174  | 154.9247 |        153.9066        |
|          cait_m36_384           |  4  | 168.1047 |  168.461  | 124.4205 |        124.7795        |
|           resnest101e           | 64  | 164.8616 | 189.9781  | 116.6019 |        122.2554        |
|             dla102              | 128 | 172.7723 | 210.7856  | 112.6868 |        113.0065        |
|     swsl_resnext101_32x16d      | 32  | 118.9903 | 141.4344  | 111.9392 |        115.5469        |
|         poolformer_m36          | 64  | 146.9646 | 147.2525  | 109.1632 |        109.8657        |
|        tnt_s_patch16_224        | 128 | 324.4139 | 324.5301  | 107.2041 |        108.7492        |
|        adv_inception_v3         | 128 | 161.0882 | 186.1847  | 106.4349 |        106.7209        |
|          inception_v3           | 128 | 161.0846 |  185.741  | 106.3707 |        107.5576        |
|       gluon_inception_v3        | 128 | 161.0837 | 185.8354  | 106.1842 |        107.542         |
|           convit_base           | 64  | 163.3428 | 163.3812  | 104.9701 |        104.9901        |
|        res2net50_14w_8s         | 128 | 141.6199 | 178.2295  | 102.0999 |        103.5102        |
|             dpn107              | 32  | 114.0702 | 131.8251  | 96.9794  |        93.3418         |
|        gluon_xception65         | 32  | 99.9288  | 117.4911  | 92.6173  |        92.4511         |
|           res2next50            | 128 | 126.3586 | 152.9867  | 92.1871  |        92.4065         |
|  swin_base_patch4_window7_224   | 64  | 147.7978 | 155.6171  | 90.7572  |        91.3657         |
|           dm_nfnet_f0           | 128 | 128.9952 | 129.0018  | 87.0219  |        90.0246         |
|          mixer_b16_224          | 128 | 116.6626 | 114.1739  | 85.3624  |        85.6183         |
|        res2net101_26w_4s        | 64  | 100.4634 | 125.4673  | 85.2397  |        89.4652         |
|            fbnetv3_b            | 128 | 115.6167 | 142.7623  | 84.8691  |         83.079         |
|            pit_b_224            | 64  | 118.8202 | 118.9807  | 82.5286  |        82.8783         |
|          convnext_base          | 64  | 124.5038 | 124.3977  | 82.2985  |        83.2135         |
|         visformer_small         | 128 | 91.3151  |  96.3839  | 77.5939  |        77.9498         |
|          gmlp_s16_224           | 128 | 137.8354 | 126.3885  | 76.3276  |         77.043         |
|      beit_base_patch16_224      | 64  | 101.4893 | 105.5884  | 75.5544  |        75.4112         |
|            nfnet_l0             | 128 | 113.2962 | 137.4679  |  75.499  |        78.0737         |
|       eca_botnext26ts_256       | 128 | 108.8663 | 147.5417  | 74.4831  |        75.0015         |
|          cspdarknet53           | 64  | 94.9226  |  113.082  | 73.6117  |        71.0547         |
|          jx_nest_base           | 32  | 102.0238 | 101.8371  |  73.277  |        73.5848         |
|           volo_d1_224           | 64  | 121.209  | 123.7604  | 71.5549  |        72.3924         |
|          botnet26t_256          | 128 | 101.956  | 116.8264  | 71.4205  |        70.6233         |
|            gernet_l             | 128 | 77.8053  |  91.9884  | 70.6557  |        68.5972         |
|      vit_base_patch16_224       | 64  | 86.9874  |  86.9976  | 70.3507  |        70.3116         |
|            repvgg_a2            | 128 | 77.8863  |  96.2781  | 68.4589  |        66.3626         |
|          gmixer_24_224          | 128 | 118.1754 | 132.1528  | 67.6391  |        68.1056         |
|      xcit_large_24_p8_224       |  5  | 125.3098 | 147.4448  | 63.0432  |        76.3894         |
|        twins_pcpvt_base         | 64  | 120.8395 | 144.0909  | 60.2931  |        67.9085         |
|       tf_efficientnet_b0        | 128 | 85.1751  | 119.8793  | 60.0848  |        58.7878         |
|           rexnet_100            | 128 | 80.1843  | 108.6552  |  59.43   |        57.5409         |
|           fbnetc_100            | 128 |  83.044  | 106.5567  |  58.816  |        56.5356         |
|         coat_lite_mini          | 128 | 113.2491 | 113.5626  | 58.1967  |        59.1546         |
|           mobilevit_s           | 64  | 84.9137  | 111.5181  | 57.2279  |        56.6976         |
|            tinynet_a            | 128 | 73.8301  | 102.9644  | 56.9059  |        55.4074         |
|        sebotnet33ts_256         | 64  | 80.5514  | 100.8522  |  51.378  |        50.5384         |
|         crossvit_9_240          | 128 | 82.4244  | 104.5286  | 50.6118  |         50.999         |
|          spnasnet_100           | 128 | 70.5698  |  89.9397  | 49.4069  |        47.0599         |
|          ghostnet_100           | 128 | 90.8128  | 118.0742  | 49.2974  |        56.1752         |
|        ese_vovnet19b_dw         | 128 | 64.6945  |  74.563   | 46.3896  |        45.5653         |
|         mobilenetv2_100         | 128 | 65.7615  |  84.5929  | 45.1514  |         43.411         |
|           mnasnet_100           | 128 | 64.4165  |  82.4905  | 42.9174  |        41.1452         |
|           selecsls42b           | 128 |  60.193  |  74.0393  | 42.6209  |        42.6554         |
|          resmlp_12_224          | 128 | 53.4468  |  59.9051  | 42.4123  |        42.5615         |
|      mobilenetv3_large_100      | 128 | 61.4819  |  76.651   | 41.0291  |        41.4206         |
|           regnety_002           | 128 | 39.6717  |  53.7141  | 26.8825  |        31.0595         |
|            lcnet_050            | 128 | 31.8638  |  40.8418  | 17.7366  |         20.93          |
| deit_base_distilled_patch16_224 | 64  | 84.8434  |  85.1391  |   nan    |          nan           |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_084_25_03_23_performance_amp_808

Commit hashes

pytorch commit: dc45ad7
pytorch commit date: 2023-03-26 00:38:50+00:00
torchbench commit: c2ef52a6f72829b77bbafbb7010bd16d8d15c916
torchbench commit date: 2023-03-24 13:58:11-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitdc45ad7

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 87%, 52/60 | 93%, 42/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 98%, 44/45  | 98%, 59/60  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.58x    |    1.61x    |    1.40x    |
| inductor_no_cudagraphs |   1.27x    |    1.48x    |    1.38x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.81    |    7.63     |    5.90     |
|       aot_eager        |    9.41    |    16.21    |    13.00    |
|        inductor        |   60.72    |    59.24    |   105.17    |
| inductor_no_cudagraphs |   59.54    |    54.32    |   105.27    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.79x    |    0.89x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.03x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Previous report name: /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Passrate diff

+------------------------+-------------+------------+-------------+
|        compiler        |    suite    | prev_value |  cur_value  |
+------------------------+-------------+------------+-------------+
|        inductor        | torchbench  | 87%, 52/60 | 87%, 52/60  |
|        inductor        | huggingface | 93%, 42/45 | 93%, 42/45  |
|        inductor        | timm_models | 98%, 59/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60 | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 98%, 44/45 | 98%, 44/45  |
| inductor_no_cudagraphs | timm_models | 98%, 59/60 | 98%, 59/60  |
+------------------------+-------------+------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.58x    |   1.58x   |
|        inductor        | huggingface |   1.58x    |   1.61x   |
|        inductor        | timm_models |   1.40x    |   1.40x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.48x    |   1.48x   |
| inductor_no_cudagraphs | timm_models |   1.38x    |   1.38x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+---------------------------------+-----------------+------------------------+
|    suite    |              name               |    inductor     | inductor_no_cudagraphs |
+-------------+---------------------------------+-----------------+------------------------+
| torchbench  |              moco               |   fail_to_run   |      fail_to_run       |
| torchbench  |       Background_Matting        | eager_variation |    eager_variation     |
| torchbench  |         vision_maskrcnn         | eager_variation |    eager_variation     |
| torchbench  |            tacotron2            |     0.0000      |         0.0000         |
| torchbench  |               gat               |     0.0000      |         0.0000         |
| torchbench  |               gcn               |     0.0000      |         0.0000         |
| torchbench  |              llama              |     0.0000      |         0.0000         |
| torchbench  |              sage               |     0.0000      |         0.0000         |
| torchbench  |          torchrec_dlrm          |     0.0000      |         0.0000         |
| huggingface |  DebertaV2ForQuestionAnswering  |   fail_to_run   |          pass          |
| huggingface |   AlbertForQuestionAnswering    |  fail_accuracy  |     fail_accuracy      |
| timm_models | deit_base_distilled_patch16_224 |      pass       |      fail_to_run       |
+-------------+---------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |        phlippe_resnet         |  1.8176  |         0.9359         |
| torchbench  |           resnet18            |  1.5937  |         0.9172         |
| torchbench  |             dcgan             |  1.4552  |         0.8388         |
| torchbench  |         lennard_jones         |  1.3765  |         0.8932         |
| torchbench  |       soft_actor_critic       |  1.1908  |         0.8116         |
| torchbench  |          timm_vovnet          |  0.9385  |         0.9248         |
| torchbench  |    nvidia_deeprecommender     |  0.872   |         1.0185         |
| torchbench  | timm_vision_transformer_large |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |  DebertaForQuestionAnswering  |  1.0116  |         0.9216         |
| huggingface |      DebertaForMaskedLM       |  0.9404  |         0.7972         |
| huggingface | DebertaV2ForQuestionAnswering |  0.8924  |         0.6276         |
| huggingface |     DebertaV2ForMaskedLM      |  0.8454  |         0.6207         |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.0871         |
| timm_models |         pnasnet5large         |  0.9125  |         0.912          |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |        phlippe_densenet        | 163.2689 |        164.9042        |
| torchbench  |          hf_T5_large           | 157.4037 |        157.6508        |
| torchbench  |       timm_efficientnet        | 141.5218 |        138.7371        |
| torchbench  |         hf_Longformer          | 140.6027 |        108.1974        |
| torchbench  |           hf_BigBird           | 137.0276 |        115.1761        |
| torchbench  |       mobilenet_v3_large       | 131.4049 |        132.7944        |
| torchbench  |          densenet121           | 129.0691 |        126.7264        |
| torchbench  |          mobilenet_v2          | 121.9696 |        126.7158        |
| huggingface |     AllenaiLongformerBase      | 143.411  |        107.8736        |
| huggingface |     MobileBertForMaskedLM      | 133.4461 |        129.675         |
| huggingface | MobileBertForQuestionAnswering | 127.2698 |         125.82         |
| huggingface |      DebertaV2ForMaskedLM      | 126.6708 |        60.7339         |
| huggingface |  MT5ForConditionalGeneration   | 126.1527 |        125.3058        |
| huggingface | DebertaV2ForQuestionAnswering  | 124.8574 |        58.5608         |
| timm_models |           rexnet_100           | 288.7461 |        283.0597        |
| timm_models |          ghostnet_100          | 238.3372 |        236.8188        |
| timm_models |           hrnet_w18            | 228.0449 |        230.6652        |
| timm_models |           fbnetv3_b            | 168.1243 |        167.7835        |
| timm_models |     mobilenetv3_large_100      | 157.9431 |        157.3151        |
| timm_models |           tinynet_a            | 157.5898 |        156.0813        |
| timm_models |          tf_mixnet_l           | 155.888  |        150.0548        |
| timm_models |          resnest101e           | 155.071  |        152.5135        |
| timm_models |          mobilevit_s           | 154.3054 |        152.3807        |
| timm_models |        adv_inception_v3        | 153.7477 |        154.5061        |
| timm_models |       gluon_inception_v3       | 153.6197 |        147.0342        |
| timm_models |       tf_efficientnet_b0       | 153.018  |        150.6381        |
| timm_models |          inception_v3          | 151.8774 |        154.8908        |
| timm_models |         pnasnet5large          | 151.159  |        147.4219        |
| timm_models |            mixnet_l            | 151.0383 |        152.4029        |
| timm_models |        twins_pcpvt_base        | 140.563  |        140.4003        |
| timm_models |       res2net101_26w_4s        | 140.5159 |        137.4283        |
| timm_models |          spnasnet_100          | 137.6578 |        134.1523        |
| timm_models |           fbnetc_100           | 135.7825 |        135.3708        |
| timm_models |        mobilenetv2_100         | 127.1539 |        127.6348        |
| timm_models |          mnasnet_100           | 123.0879 |        117.5419        |
| timm_models |      xcit_large_24_p8_224      | 121.0067 |        118.8733        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |              hf_GPT2_large              |  0.8906  |         1.1284         |
| torchbench  |                 yolov3                  |  0.8712  |         1.0114         |
| torchbench  |           speech_transformer            |  0.8651  |         0.869          |
| torchbench  |           shufflenet_v2_x1_0            |  0.8615  |         0.9647         |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.8835         |
| torchbench  |               timm_regnet               |  0.8505  |         0.9525         |
| torchbench  |                resnet152                |  0.8486  |         0.9407         |
| torchbench  |           Background_Matting            |  0.8484  |         1.0409         |
| torchbench  |              hf_DistilBert              |  0.8476  |         0.9479         |
| torchbench  |              timm_resnest               |  0.8469  |         0.967          |
| torchbench  |               hf_T5_large               |  0.8201  |         1.168          |
| torchbench  |              pytorch_unet               |  0.8134  |         0.9308         |
| torchbench  |            phlippe_densenet             |  0.8058  |         0.8659         |
| torchbench  |                resnet50                 |  0.7824  |         0.8835         |
| torchbench  |                  dcgan                  |  0.7821  |         0.9645         |
| torchbench  |                 demucs                  |  0.7733  |         0.9662         |
| torchbench  |              squeezenet1_1              |  0.773   |         0.9087         |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.8893         |
| torchbench  |                 hf_Bart                 |  0.7535  |         0.9285         |
| torchbench  |               timm_vovnet               |  0.7529  |         0.8869         |
| torchbench  |               mnasnet1_0                |  0.7418  |         0.8038         |
| torchbench  |             pytorch_struct              |  0.7274  |         0.7358         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9805         |
| torchbench  |                 alexnet                 |  0.7088  |         0.9385         |
| torchbench  |               densenet121               |  0.7085  |         0.8034         |
| torchbench  |           mobilenet_v3_large            |  0.6987  |         0.8078         |
| torchbench  |               hf_BigBird                |  0.6971  |         1.0994         |
| torchbench  |             resnext50_32x4d             |  0.6671  |         0.7713         |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.8931         |
| torchbench  |                   drq                   |  0.6379  |         0.9573         |
| torchbench  |            soft_actor_critic            |  0.6066  |         0.9973         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.7463         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.5904  |         0.6172         |
| torchbench  |                resnet18                 |  0.5423  |         0.6127         |
| torchbench  |              lennard_jones              |  0.5317  |         0.9997         |
| torchbench  |               hf_Reformer               |  0.4538  |         0.8022         |
| torchbench  |              hf_Longformer              |  0.417   |         0.8947         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3169  |         0.3395         |
| huggingface |            PLBartForCausalLM            |  0.8907  |         0.9249         |
| huggingface |     PegasusForConditionalGeneration     |  0.8901  |         1.0074         |
| huggingface |           ElectraForCausalLM            |  0.889   |         0.8941         |
| huggingface |          DistilBertForMaskedLM          |  0.8849  |         0.9624         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8729  |         0.9803         |
| huggingface |      MBartForConditionalGeneration      |  0.8672  |         1.0307         |
| huggingface |            TrOCRForCausalLM             |  0.8619  |         0.9075         |
| huggingface |            MBartForCausalLM             |  0.8491  |         0.9507         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         1.0139         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         1.0962         |
| huggingface |             BartForCausalLM             |  0.8301  |         0.943          |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8065  |         0.8318         |
| huggingface |           PegasusForCausalLM            |  0.7952  |         0.9252         |
| huggingface |         Speech2Text2ForCausalLM         |  0.7566  |         0.808          |
| huggingface |          MobileBertForMaskedLM          |  0.7473  |         1.016          |
| huggingface |             XGLMForCausalLM             |  0.6744  |         0.9287         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6569  |         0.8392         |
| huggingface |     M2M100ForConditionalGeneration      |  0.6058  |         0.8978         |
| huggingface |           DebertaForMaskedLM            |  0.5501  |         0.9978         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5197  |         0.9665         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.487   |         0.9801         |
| huggingface |          AllenaiLongformerBase          |  0.4688  |         0.8742         |
| huggingface |       DebertaForQuestionAnswering       |  0.4601  |         1.1527         |
| timm_models |                hrnet_w18                |  0.8918  |          0.99          |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1115         |
| timm_models |              inception_v3               |  0.8904  |         1.0171         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0171         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0171         |
| timm_models |                 dpn107                  |  0.8833  |         0.9642         |
| timm_models |            gluon_xception65             |  0.8831  |         0.9705         |
| timm_models |              ghostnet_100               |  0.8807  |         0.977          |
| timm_models |              spnasnet_100               |  0.8786  |         0.9451         |
| timm_models |          mobilenetv3_large_100          |  0.877   |         0.9361         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1871         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0072         |
| timm_models |          xcit_large_24_p8_224           |  0.8721  |         0.9732         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9607         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9483         |
| timm_models |                mixnet_l                 |  0.8687  |         0.9902         |
| timm_models |               mnasnet_100               |  0.8683  |         0.9403         |
| timm_models |               res2next50                |  0.866   |         0.9547         |
| timm_models |              cait_m36_384               |  0.8632  |         0.989          |
| timm_models |               fbnetc_100                |  0.8596  |         0.9535         |
| timm_models |                pit_b_224                |  0.8578  |         1.0242         |
| timm_models |               selecsls42b               |  0.8576  |         0.9664         |
| timm_models |              convnext_base              |  0.8505  |         1.0338         |
| timm_models |                gernet_l                 |  0.8499  |         0.9706         |
| timm_models |         swsl_resnext101_32x16d          |  0.8461  |         0.9786         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0202         |
| timm_models |              botnet26t_256              |  0.8239  |         0.9779         |
| timm_models |                lcnet_050                |  0.805   |         0.884          |
| timm_models |                repvgg_a2                |  0.7738  |         0.9611         |
| timm_models |               regnety_002               |  0.7602  |         0.8966         |
| timm_models |             crossvit_9_240              |  0.7526  |         0.9898         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9045         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9604         |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/memory_over_time.png :

bench_logs/passrate_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/comp_time_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Performance speedup regressions

+------------------------+----------------+-------------+------------+
|        compiler        |      name      | prev_status | cur_status |
+------------------------+----------------+-------------+------------+
| inductor_no_cudagraphs | phlippe_resnet |   0.9702    |   0.9359   |
+------------------------+----------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808

Accuracy regressions

+------------------------+---------------------------------+-------------+-------------+
|        compiler        |              name               | prev_status | cur_status  |
+------------------------+---------------------------------+-------------+-------------+
| inductor_no_cudagraphs | deit_base_distilled_patch16_224 |    pass     | fail_to_run |
+------------------------+---------------------------------+-------------+-------------+

Compilation latency (sec) regressions

+----------+-------------+-------------+------------+
| compiler |    name     | prev_status | cur_status |
+----------+-------------+-------------+------------+
| inductor | mnasnet_100 |   118.447   |  123.0879  |
+----------+-------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9643 |  0.9158   |  3.6049  |         1.3556         |
|           BERT_pytorch            |  16  | 0.9935 |  0.8029   |  2.9774  |         2.1045         |
|            densenet121            |  4   | 0.9881 |  0.7232   |  2.806   |         1.0052         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9733 |   0.915   |  2.6025  |         1.7537         |
|            hf_BigBird             |  2   | 0.9505 |  0.7672   |  2.5383  |         1.611          |
|             hf_Albert             |  8   | 0.9958 |  0.9536   |  2.2891  |         2.2504         |
|            hf_T5_large            |  2   | 0.9743 |  0.8115   |  2.2339  |         1.8619         |
|               dlrm                | 1024 | 0.9428 |  0.8338   |  2.2191  |         1.1541         |
|        mobilenet_v3_large         |  32  | 0.9952 |  0.7829   |  2.1048  |         1.1789         |
|         phlippe_densenet          | 128  | 0.985  |  0.7728   |  2.0558  |          1.02          |
|           squeezenet1_1           |  32  | 0.9897 |  0.9346   |  2.0081  |         1.2981         |
|               hf_T5               |  8   | 0.9851 |  0.8487   |  1.8934  |         1.9417         |
|              hf_GPT2              |  4   | 0.9923 |  0.9093   |  1.8764  |         1.7649         |
|          phlippe_resnet           | 128  | 0.9825 |  0.7622   |  1.8176  |         0.9359         |
|              hf_Bert              |  4   | 0.9948 |  0.8383   |  1.7789  |         1.5758         |
|          resnext50_32x4d          |  8   | 0.9825 |  0.7156   |  1.7015  |         0.9773         |
|            mnasnet1_0             |  32  | 0.9932 |  0.7374   |  1.6667  |         1.078          |
|              hf_Bart              |  4   | 0.968  |  0.7717   |  1.6664  |         1.3385         |
|           hf_GPT2_large           |  4   | 0.9829 |  0.9718   |  1.6545  |         1.7159         |
|        speech_transformer         |  32  | 0.9799 |  0.7893   |  1.6135  |         1.5649         |
|        shufflenet_v2_x1_0         | 128  | 0.9951 |  0.7523   |  1.6055  |         1.1793         |
|                drq                |  1   | 0.9664 |   0.749   |  1.5947  |         1.0486         |
|             resnet18              |  16  | 0.9852 |  0.7681   |  1.5937  |         0.9172         |
|           hf_Bert_large           |  4   | 0.9967 |  0.8797   |  1.5775  |         1.5594         |
|           fastNLP_Bert            |  6   | 0.9908 |  0.8544   |  1.5608  |         1.4898         |
|           timm_resnest            |  32  | 0.9929 |  0.8483   |  1.5545  |         1.4955         |
|      timm_vision_transformer      |  32  | 0.9828 |  0.8532   |  1.5522  |         1.4061         |
| attention_is_all_you_need_pytorch | 256  | 0.9887 |  0.8418   |  1.5423  |         1.4348         |
|            timm_nfnet             | 128  | 0.9869 |  0.9846   |  1.5136  |         1.449          |
|           mobilenet_v2            |  96  | 0.9972 |   0.777   |  1.5093  |         1.4874         |
|               dcgan               |  32  | 0.8601 |  0.6897   |  1.4552  |         0.8388         |
|           hf_DistilBert           |  8   | 0.9809 |  0.9362   |  1.4428  |         1.4747         |
|           hf_Longformer           |  2   | 0.8258 |  0.5545   |  1.4298  |         1.2505         |
|         timm_efficientnet         |  32  | 0.9371 |  0.6242   |  1.4283  |         1.0685         |
|          pytorch_struct           | 200  | 0.9211 |  0.7641   |  1.4197  |         1.0939         |
|           lennard_jones           | 1000 | 0.8398 |  0.7405   |  1.3765  |         0.8932         |
|           pytorch_unet            |  1   | 0.9963 |  0.2049   |  1.3728  |         1.3691         |
|          LearningToPaint          |  96  | 0.9909 |  0.7841   |  1.3646  |         1.068          |
|          pytorch_stargan          |  16  | 0.9913 |  0.8132   |  1.2555  |         1.2448         |
|               vgg16               |  64  | 0.9994 |  0.9987   |  1.2398  |         1.2519         |
|            Super_SloMo            |  6   | 0.9976 |  0.1792   |  1.2314  |         1.2326         |
|        Background_Matting         |  4   | 0.9994 |   0.137   |  1.2114  |         1.2076         |
|             resnet50              |  32  | 0.9981 |  0.7738   |  1.196   |         1.0598         |
|             resnet152             |  32  | 0.9958 |   0.753   |  1.1914  |         1.0239         |
|         soft_actor_critic         | 256  | 0.8629 |  0.6286   |  1.1908  |         0.8116         |
|              yolov3               |  16  | 0.9961 |  0.8067   |  1.1884  |         1.1898         |
|            hf_Reformer            |  4   | 0.9867 |  0.9649   |  1.1425  |         1.0665         |
|              alexnet              | 128  | 0.9987 |  0.9972   |  1.087   |         1.1349         |
|              demucs               |  4   | 0.9988 |  1.0035   |  1.0369  |         1.0363         |
|            timm_regnet            |  32  | 0.9155 |  0.7687   |  0.9939  |         1.0126         |
|            tts_angular            |  64  | 0.9209 |  0.8908   |  0.9573  |         0.9562         |
|            timm_vovnet            |  32  | 0.8497 |  0.7067   |  0.9385  |         0.9248         |
|      nvidia_deeprecommender       | 256  | 0.999  |  0.9981   |  0.872   |         1.0185         |
|   timm_vision_transformer_large   |  32  | 0.998  |    0.0    |   0.0    |          0.0           |
|               moco                |  32  | 0.9803 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|         phlippe_densenet          | 128  | 3.1868  |  6.9475   | 163.2689 |        164.9042        |
|            hf_T5_large            |  2   | 27.5618 |  55.4875  | 157.4037 |        157.6508        |
|         timm_efficientnet         |  32  | 4.9256  |  10.1348  | 141.5218 |        138.7371        |
|           hf_Longformer           |  2   | 11.5255 |  31.2769  | 140.6027 |        108.1974        |
|            hf_BigBird             |  2   | 12.8893 |  38.0693  | 137.0276 |        115.1761        |
|        mobilenet_v3_large         |  32  | 3.4736  |  7.5726   | 131.4049 |        132.7944        |
|            densenet121            |  4   |  7.639  |  17.8983  | 129.0691 |        126.7264        |
|           mobilenet_v2            |  96  | 3.1223  |  6.9967   | 121.9696 |        126.7158        |
|              yolov3               |  16  | 4.9065  |  10.7775  | 114.728  |        114.5451        |
|            mnasnet1_0             |  32  | 3.1678  |  6.7394   | 107.9633 |        106.1536        |
|             resnet152             |  32  | 9.0487  |  20.3645  |  99.615  |        96.6824         |
|           hf_GPT2_large           |  4   | 14.7812 |  30.0606  | 96.3907  |        96.7287         |
|           timm_resnest            |  32  | 1.8005  |  4.1016   |  89.547  |         98.632         |
|        shufflenet_v2_x1_0         | 128  | 3.4684  |   7.758   | 79.0727  |        77.3621         |
| attention_is_all_you_need_pytorch | 256  | 4.4554  |  11.0279  | 70.7905  |         70.22          |
|        speech_transformer         |  32  | 6.0662  |  13.6806  | 70.7576  |        72.9882         |
|            timm_regnet            |  32  |  6.645  |  12.3528  | 69.6059  |         66.661         |
|            timm_nfnet             | 128  | 5.7289  |  11.2372  |  68.606  |        68.3467         |
|        Background_Matting         |  4   | 2.9983  |  11.5713  | 67.6472  |        64.8312         |
|           BERT_pytorch            |  16  |  4.874  |  11.5101  |  64.948  |        64.7095         |
|             resnet50              |  32  | 3.1838  |  7.0498   | 61.7297  |        62.8278         |
|            timm_vovnet            |  32  | 3.5911  |  6.3974   | 59.2739  |        59.5677         |
|           pytorch_unet            |  1   |  1.516  |  4.4082   | 57.9019  |        57.0841         |
|              hf_Bart              |  4   | 10.5305 |  18.1122  | 57.2809  |         55.797         |
|           hf_Bert_large           |  4   | 10.3358 |  21.454   | 56.6343  |        56.1302         |
|       functorch_dp_cifar10        |  64  |  1.211  |  2.3923   | 52.5607  |        54.4021         |
|          resnext50_32x4d          |  8   | 3.2386  |  7.0209   | 51.4256  |        50.5324         |
|      timm_vision_transformer      |  32  |  3.322  |  7.3593   | 48.3302  |        46.8172         |
|               hf_T5               |  8   | 5.7278  |  13.3625  | 46.4112  |        45.9002         |
|          pytorch_stargan          |  16  | 1.1906  |  3.1629   | 45.5018  |        45.6083         |
|           fastNLP_Bert            |  6   | 5.2477  |  11.298   | 45.4537  |        45.3085         |
|             resnet18              |  16  | 1.3444  |  2.8701   | 42.2242  |        43.2187         |
|          LearningToPaint          |  96  | 1.4105  |  2.8434   | 41.9691  |        43.5743         |
|            Super_SloMo            |  6   | 2.7377  |  9.7036   | 41.0038  |        38.9114         |
|            hf_Reformer            |  4   | 4.1451  |  6.0242   | 40.1121  |        35.9262         |
|              hf_GPT2              |  4   | 4.6246  |  9.6261   | 38.8012  |        38.3177         |
|             hf_Albert             |  8   |  2.531  |  8.0724   | 35.0957  |        36.1336         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.212  |  2.9721   | 33.7261  |        34.2491         |
|              hf_Bert              |  4   | 5.0865  |  10.5892  | 33.4623  |        33.6168         |
|          phlippe_resnet           | 128  | 1.3368  |  2.8292   | 31.4432  |        32.0987         |
|           hf_DistilBert           |  8   | 2.4223  |  5.5367   | 29.6942  |        27.7697         |
|              demucs               |  4   | 1.4386  |  2.1722   | 28.8172  |        28.4259         |
|           squeezenet1_1           |  32  | 1.0533  |  1.7652   | 23.1686  |        22.0059         |
|          pytorch_struct           | 200  |  0.75   |  1.3326   | 19.0121  |        17.9441         |
|               vgg16               |  64  | 0.6265  |   1.126   | 15.6882  |        14.8882         |
|              alexnet              | 128  |  0.483  |  0.7756   | 14.3255  |        13.4945         |
|                drq                |  1   | 0.6601  |  1.0201   |  9.8311  |         9.3241         |
|      nvidia_deeprecommender       | 256  | 0.4799  |  0.7561   |  9.4967  |         9.1107         |
|         soft_actor_critic         | 256  | 0.4229  |  0.6083   |  7.4131  |         7.1518         |
|               dlrm                | 1024 | 0.3738  |   0.784   |  7.4019  |         7.2395         |
|               dcgan               |  32  | 0.4328  |  0.7019   |  6.6347  |         7.6406         |
|            tts_angular            |  64  | 0.4442  |  0.5216   |  5.8859  |         5.7635         |
|           lennard_jones           | 1000 | 0.3926  |  0.6219   |  5.2797  |         5.3479         |
|               moco                |  32  | 27.5915 |    nan    |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 9.2938  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.2082  |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2037         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9869 |  0.7649   |  1.0097  |         1.1019         |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  0.9852  |         0.9957         |
|            timm_nfnet             | 128  | 0.9071 |  0.8747   |  0.9693  |         1.0734         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.9425  |         1.026          |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9319  |         1.0718         |
|         timm_efficientnet         |  32  | 0.9862 |  0.7658   |  0.9287  |         1.0058         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8906  |         1.1284         |
|              yolov3               |  16  | 0.9837 |   0.846   |  0.8712  |         1.0114         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.869          |
|        shufflenet_v2_x1_0         | 128  | 0.9549 |  0.8423   |  0.8615  |         0.9647         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9903 |  0.8512   |  0.8505  |         0.9525         |
|             resnet152             |  32  | 0.9948 |  0.8929   |  0.8486  |         0.9407         |
|        Background_Matting         |  4   | 1.0125 |  0.6486   |  0.8484  |         1.0409         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9479         |
|           timm_resnest            |  32  | 0.9888 |  0.8935   |  0.8469  |         0.967          |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|             resnet50              |  32  | 0.9932 |  0.8634   |  0.7824  |         0.8835         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|              demucs               |  4   | 0.9663 |  0.9664   |  0.7733  |         0.9662         |
|           squeezenet1_1           |  32  | 0.9674 |  0.9291   |  0.773   |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.7535  |         0.9285         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|            mnasnet1_0             |  32  | 0.9757 |  0.8641   |  0.7418  |         0.8038         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7274  |         0.7358         |
|               vgg16               |  64  | 0.9919 |  0.7243   |  0.7227  |         0.9805         |
|              alexnet              | 128  | 0.9455 |   0.793   |  0.7088  |         0.9385         |
|            densenet121            |  4   | 0.9944 |  0.9783   |  0.7085  |         0.8034         |
|        mobilenet_v3_large         |  32  | 0.9778 |  0.8395   |  0.6987  |         0.8078         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6971  |         1.0994         |
|          resnext50_32x4d          |  8   | 0.9967 |  0.8434   |  0.6671  |         0.7713         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8594   |  0.5904  |         0.6172         |
|             resnet18              |  16  | 0.9751 |  0.7996   |  0.5423  |         0.6127         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|           hf_Longformer           |  2   | 0.8567 |  0.8296   |  0.417   |         0.8947         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |          nan           |
|               moco                |  32  | 0.9979 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.7296 | 215.4044  | 126.1871 |        121.788         |
|        Background_Matting         |  4   | 125.918  | 920.0697  | 103.7275 |        104.3597        |
|            hf_T5_large            |  2   | 230.1605 | 275.1893  | 101.6018 |        118.8865        |
|               hf_T5               |  8   | 181.9032 | 211.1589  | 94.7053  |        92.4509         |
|           hf_Longformer           |  2   | 138.2097 |  203.281  | 79.3119  |        90.8209         |
|            timm_nfnet             | 128  | 120.4148 | 119.9473  | 78.2426  |        81.7661         |
|            hf_BigBird             |  2   | 205.154  | 244.3997  | 77.7147  |        117.4356        |
|            hf_Reformer            |  4   | 82.0007  |  83.918   | 70.8207  |        75.9167         |
|            Super_SloMo            |  6   | 79.9461  |  443.516  | 64.6541  |        64.5001         |
|              yolov3               |  16  |  68.786  |  84.8398  | 57.8493  |        57.6083         |
|            timm_regnet            |  32  | 61.1992  |  72.7539  |  56.812  |        57.1343         |
|               vgg16               |  64  | 66.4019  |  66.409   | 53.5779  |        52.9954         |
|             resnet152             |  32  | 64.2393  |  84.7187  | 52.4225  |        60.9595         |
|           hf_Bert_large           |  4   | 83.0404  |  93.1456  | 52.2919  |        53.1372         |
|              demucs               |  4   | 53.6044  |  53.7455  | 51.7394  |        51.7905         |
|        speech_transformer         |  32  | 64.1509  |  71.0338  | 40.8009  |        36.8213         |
| attention_is_all_you_need_pytorch | 256  |  55.351  |  67.1481  | 37.2593  |        37.6301         |
|           fastNLP_Bert            |  6   | 54.2729  |  62.538   | 36.0813  |        34.7287         |
|              hf_Bart              |  4   |  64.971  |  98.673   | 34.7839  |         56.986         |
|           mobilenet_v2            |  96  | 47.1041  |  60.5217  | 31.1621  |         31.633         |
|             hf_Albert             |  8   | 68.8106  |   71.55   | 29.8545  |        30.3165         |
|           pytorch_unet            |  1   | 40.0006  | 194.2562  |  28.991  |        29.0747         |
|              hf_GPT2              |  4   | 49.0972  |  53.3534  | 27.8531  |        27.8392         |
|            timm_vovnet            |  32  | 29.1264  |  34.9631  | 26.3227  |         26.845         |
|              hf_Bert              |  4   | 40.5217  |  48.7778  | 22.8006  |        26.0872         |
|         timm_efficientnet         |  32  | 34.2296  |  51.2253  | 22.4669  |        30.1284         |
|           hf_DistilBert           |  8   | 32.1306  |  35.2059  | 22.0797  |        21.9127         |
|             resnet50              |  32  | 26.4647  |  34.0544  |  22.075  |        24.9267         |
|            densenet121            |  4   | 55.2594  |  72.2055  | 19.0237  |        56.9019         |
|        shufflenet_v2_x1_0         | 128  | 31.1039  |  40.5685  | 18.8152  |        25.8637         |
|      timm_vision_transformer      |  32  | 29.1715  |  33.7833  | 18.2935  |        22.7707         |
|           BERT_pytorch            |  16  | 53.0138  |  76.4871  | 17.7735  |        25.4831         |
|           timm_resnest            |  32  | 24.2962  |  28.3478  | 15.4718  |        16.1434         |
|        mobilenet_v3_large         |  32  | 27.1052  |  36.3077  | 13.4957  |        22.6579         |
|            mnasnet1_0             |  32  | 25.0017  |  31.3303  | 13.1874  |        20.4363         |
|      nvidia_deeprecommender       | 256  | 10.2402  |  10.2487  | 11.7194  |        10.0406         |
|          pytorch_stargan          |  16  | 14.9226  |  17.8528  | 11.6024  |        11.8858         |
|          resnext50_32x4d          |  8   |  20.476  |  28.4316  | 11.5946  |        20.6354         |
|         phlippe_densenet          | 128  | 23.2894  |  29.3346  | 11.4218  |        22.7813         |
|              alexnet              | 128  |  9.8401  |  9.8664   |  9.0439  |         8.6697         |
|          LearningToPaint          |  96  | 11.4907  |  14.1889  |  8.6154  |        10.4889         |
|            tts_angular            |  64  |  6.7034  |  6.9756   |  6.5947  |         6.5482         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.3259  |  16.423   |  5.7805  |         7.9796         |
|             resnet18              |  16  |  9.2875  |  12.1155  |  5.6653  |        11.7184         |
|           squeezenet1_1           |  32  | 11.9831  |  11.1178  |  5.4812  |         8.0041         |
|          phlippe_resnet           | 128  |  8.9777  |  11.7176  |  4.9613  |        10.6225         |
|          pytorch_struct           | 200  |  4.9915  |  6.0311   |  3.3161  |         4.268          |
|       functorch_dp_cifar10        |  64  | 10.3856  |  11.1501  |  2.8877  |         7.5692         |
|                drq                |  1   |  3.3932  |  4.4346   |  2.2426  |         3.4235         |
|               dlrm                | 1024 |  4.3604  |  5.5696   |  2.1408  |         4.1734         |
|               dcgan               |  32  |  2.3769  |  3.0927   |  1.4783  |         2.5649         |
|         soft_actor_critic         | 256  |  1.7222  |  2.7109   |  1.4237  |         1.936          |
|           lennard_jones           | 1000 |  1.7752  |  2.3812   |  1.1121  |         1.7265         |
|   timm_vision_transformer_large   |  32  | 465.1005 |    nan    |   nan    |          nan           |
|               moco                |  32  | 51.5426  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 0.9537 |  0.8043   |  2.9181  |          1.07          |
|       MT5ForConditionalGeneration       | 16  | 0.9876 |  0.8371   |  2.4045  |         1.834          |
|             OPTForCausalLM              |  2  | 0.9852 |  0.9257   |  2.3949  |         2.4822         |
|      GPT2ForSequenceClassification      |  4  | 0.9769 |  0.9517   |  2.2393  |         2.2709         |
|     MobileBertForQuestionAnswering      | 128 | 0.9532 |   0.795   |  2.1659  |         1.0726         |
|             XGLMForCausalLM             |  8  | 0.9313 |  0.7358   |  2.1222  |         1.1948         |
|       ElectraForQuestionAnswering       | 64  | 0.987  |  0.9773   |  2.1152  |         2.0869         |
|     M2M100ForConditionalGeneration      | 16  | 1.0194 |  0.8055   |  1.955   |         1.3572         |
|            XLNetLMHeadModel             |  8  | 0.9948 |  0.9656   |  1.8109  |         1.8138         |
|           ElectraForCausalLM            | 32  | 0.9825 |  0.9352   |  1.8007  |         1.8342         |
|    LayoutLMForSequenceClassification    | 16  | 0.9847 |  0.9707   |  1.7848  |         1.7639         |
|       RobertaForQuestionAnswering       | 16  | 0.9844 |   0.97    |  1.7772  |         1.7465         |
|        BertForQuestionAnswering         | 16  | 0.9849 |  0.9692   |  1.7676  |         1.7528         |
|           RobertaForCausalLM            | 16  | 0.987  |  0.9618   |  1.6701  |         1.665          |
|     PLBartForConditionalGeneration      |  4  | 0.9949 |  0.9504   |  1.6562  |         1.6401         |
|       AlbertForQuestionAnswering        |  4  | 0.9998 |  0.8855   |  1.6544  |         1.6489         |
|               DistillGPT2               | 16  | 0.9878 |  0.9554   |  1.6504  |         1.6915         |
|            AlbertForMaskedLM            |  4  | 0.9998 |  0.8851   |  1.6448  |         1.6393         |
|                 T5Small                 |  4  | 0.9786 |  0.8496   |  1.6288  |         1.7516         |
|       T5ForConditionalGeneration        |  4  | 0.976  |  0.8494   |  1.6264  |         1.7353         |
|            PLBartForCausalLM            |  8  | 0.9872 |  0.9655   |  1.6121  |         1.6825         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9812 |  0.9603   |  1.6031  |         1.6287         |
|             BertForMaskedLM             | 16  | 0.9866 |  0.9608   |  1.5996  |         1.5888         |
|          AllenaiLongformerBase          |  4  | 0.8848 |   0.625   |  1.593   |         1.4926         |
|           LayoutLMForMaskedLM           | 16  | 0.9864 |  0.9614   |  1.5689  |         1.6098         |
|                CamemBert                | 16  | 0.9876 |  0.9634   |  1.5449  |         1.5332         |
|      MBartForConditionalGeneration      |  2  | 0.9993 |  0.9707   |  1.5126  |         1.4621         |
|      BartForConditionalGeneration       |  2  | 0.9955 |  0.9545   |  1.4975  |         1.4793         |
|            YituTechConvBert             | 16  | 0.9861 |  0.9553   |  1.4926  |         1.4927         |
|             BartForCausalLM             |  4  | 0.9899 |  0.9641   |  1.4925  |         1.5386         |
|            MBartForCausalLM             |  4  | 0.9887 |  0.9634   |  1.4899  |         1.5361         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9959 |  0.9195   |  1.4891  |         1.4138         |
|         Speech2Text2ForCausalLM         | 256 | 0.9757 |  0.9284   |  1.4762  |         1.4978         |
|         MegatronBertForCausalLM         |  4  | 0.9886 |  0.9023   |  1.4656  |         1.4969         |
|     DistilBertForQuestionAnswering      | 256 | 0.9936 |  0.9853   |  1.4414  |         1.4416         |
|     PegasusForConditionalGeneration     | 32  | 0.9996 |  0.9276   |  1.3521  |         1.2828         |
|            TrOCRForCausalLM             | 32  | 0.9893 |   0.962   |  1.2409  |         1.2876         |
|          DistilBertForMaskedLM          | 128 | 0.9925 |  0.9499   |  1.2253  |         1.2453         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9577 |  0.8854   |  1.2166  |         1.2003         |
|           PegasusForCausalLM            | 32  | 0.9501 |  0.8979   |  1.1667  |         1.1839         |
|       DebertaForQuestionAnswering       |  8  | 0.7629 |  0.6791   |  1.0116  |         0.9216         |
|           DebertaForMaskedLM            |  4  | 0.7457 |  0.5502   |  0.9404  |         0.7972         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7019 |  0.5202   |  0.8924  |         0.6276         |
|          DebertaV2ForMaskedLM           |  1  | 0.7194 |  0.5203   |  0.8454  |         0.6207         |
|          BlenderbotForCausalLM          |  4  | 0.9279 |  0.7235   |   0.0    |         1.0871         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          AllenaiLongformerBase          |  4  | 11.5775 |  32.1483  | 143.411  |        107.8736        |
|          MobileBertForMaskedLM          | 64  | 17.7897 |  40.7099  | 133.4461 |        129.675         |
|     MobileBertForQuestionAnswering      | 128 | 17.5898 |  40.8442  | 127.2698 |         125.82         |
|          DebertaV2ForMaskedLM           |  1  | 15.4099 |  26.8723  | 126.6708 |        60.7339         |
|       MT5ForConditionalGeneration       | 16  | 8.4237  |  18.5833  | 126.1527 |        125.3058        |
|      DebertaV2ForQuestionAnswering      |  2  | 15.1995 |  26.892   | 124.8574 |        58.5608         |
|     M2M100ForConditionalGeneration      | 16  | 12.0101 |  25.8271  | 106.3723 |        97.6783         |
|            XLNetLMHeadModel             |  8  | 10.3488 |  27.5655  | 84.7826  |        86.2942         |
|           DebertaForMaskedLM            |  4  | 7.3017  |  13.5669  | 75.8825  |        49.3896         |
|       DebertaForQuestionAnswering       |  8  | 7.4609  |  13.7346  | 74.9498  |        45.4418         |
|            YituTechConvBert             | 16  | 10.7406 |  19.6784  | 73.4469  |        70.7538         |
|             XGLMForCausalLM             |  8  | 9.5986  |  21.2365  | 72.2505  |        64.7467         |
|      MBartForConditionalGeneration      |  2  | 11.6916 |  26.3987  | 70.4711  |        69.2326         |
|     PegasusForConditionalGeneration     | 32  | 5.1801  |  19.3634  | 68.6156  |        63.7727         |
|      BartForConditionalGeneration       |  2  | 11.8547 |  26.2312  | 65.6556  |         64.908         |
|           ElectraForCausalLM            | 32  | 7.5504  |  13.9049  | 62.0314  |        62.4792         |
|    MegatronBertForQuestionAnswering     |  8  | 10.4973 |  21.6507  | 58.0305  |        56.3825         |
|         MegatronBertForCausalLM         |  4  | 10.5798 |  21.8752  | 57.9416  |         59.218         |
|     PLBartForConditionalGeneration      |  4  | 9.1896  |  16.7263  | 57.5901  |        53.8556         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.6637  |  17.4455  | 48.1675  |        49.2884         |
|                 T5Small                 |  4  | 5.7618  |  12.6564  | 47.1634  |        46.1111         |
|       T5ForConditionalGeneration        |  4  | 5.8098  |  12.7587  | 46.2388  |        44.7436         |
|           PegasusForCausalLM            | 32  | 6.0084  |  11.4921  | 44.6074  |        39.2755         |
|            MBartForCausalLM             |  4  | 6.5438  |  12.0119  |  44.327  |        38.7314         |
|             BartForCausalLM             |  4  |  6.253  |  11.8873  | 44.0542  |        39.6069         |
|            TrOCRForCausalLM             | 32  | 6.4937  |  11.862   | 43.9509  |        39.4461         |
|    LayoutLMForSequenceClassification    | 16  | 5.4631  |  11.3473  | 42.1005  |        42.2418         |
|             OPTForCausalLM              |  2  | 5.4248  |  11.0913  | 40.6216  |        37.3468         |
|       ElectraForQuestionAnswering       | 64  | 5.2263  |  10.9484  | 40.5216  |         39.742         |
|           LayoutLMForMaskedLM           | 16  |  5.584  |  11.4056  |  36.773  |        35.4466         |
|             BertForMaskedLM             | 16  | 5.1764  |  10.8106  | 35.6557  |        35.2234         |
|        BertForQuestionAnswering         | 16  |  5.171  |  10.9113  | 35.2653  |        35.9333         |
|            AlbertForMaskedLM            |  4  | 2.3958  |  8.2025   | 34.9712  |        32.8335         |
|       BlenderbotSmallForCausalLM        | 64  | 4.5892  |  8.2897   | 33.3481  |        35.0632         |
|           RobertaForCausalLM            | 16  | 5.4526  |  10.9382  | 33.3153  |        34.0445         |
|         Speech2Text2ForCausalLM         | 256 | 3.4331  |  6.1091   | 33.0115  |         31.008         |
|     DistilBertForQuestionAnswering      | 256 | 2.5074  |  5.5355   | 32.9733  |        33.8658         |
|       RobertaForQuestionAnswering       | 16  | 5.4181  |  10.7747  | 32.5793  |        32.8178         |
|            PLBartForCausalLM            |  8  | 3.7569  |  6.6212   | 32.2612  |        31.2841         |
|      GPT2ForSequenceClassification      |  4  | 4.8306  |  10.1209  | 32.1759  |        32.7141         |
|          DistilBertForMaskedLM          | 128 | 2.4786  |  5.5505   | 31.6285  |        30.2289         |
|                CamemBert                | 16  | 5.2442  |  10.9862  | 31.4615  |        33.9037         |
|       AlbertForQuestionAnswering        |  4  | 2.3649  |   8.124   |  31.031  |        29.1749         |
|               DistillGPT2               | 16  | 2.5506  |  5.1436   | 25.8872  |        24.4706         |
|          BlenderbotForCausalLM          |  4  | 11.6378 |  22.5418  |   nan    |        62.6793         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9252   |  1.062   |         1.1099         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|            YituTechConvBert             | 16  | 0.953  |  0.8749   |  0.9793  |         0.9905         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8787   |  0.9563  |         0.9847         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0019         |
|            PLBartForCausalLM            |  8  | 0.9237 |  0.8182   |  0.8907  |         0.9249         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8966   |  0.8901  |         1.0074         |
|           ElectraForCausalLM            | 32  | 0.9161 |   0.786   |  0.889   |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|            TrOCRForCausalLM             | 32  |  0.92  |   0.829   |  0.8619  |         0.9075         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8924   |  0.8491  |         0.9507         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|             BartForCausalLM             |  4  | 0.951  |  0.8923   |  0.8301  |         0.943          |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.8065  |         0.8318         |
|           PegasusForCausalLM            | 32  | 0.9257 |  0.8421   |  0.7952  |         0.9252         |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7573   |  0.7566  |         0.808          |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |  0.6744  |         0.9287         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.6058  |         0.8978         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9763   |  0.487   |         0.9801         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |  0.4688  |         0.8742         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1527         |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |   nan    |         0.9941         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.5662 | 301.0621  | 162.0585 |        162.4475        |
|       AlbertForQuestionAnswering        |  4  | 264.4316 | 298.4611  | 159.9313 |        160.2814        |
|            XLNetLMHeadModel             |  8  | 279.9054 | 288.3215  | 154.4514 |        153.5794        |
|      DebertaV2ForQuestionAnswering      |  2  | 151.969  |  202.616  | 134.1597 |        165.255         |
|          DebertaV2ForMaskedLM           |  1  | 161.1511 | 196.0148  | 123.5229 |        163.4218        |
|          AllenaiLongformerBase          |  4  | 206.1654 | 291.9025  | 113.5534 |        121.666         |
|     PegasusForConditionalGeneration     | 32  | 143.3481 | 153.0957  | 111.9413 |        107.9267        |
|            TrOCRForCausalLM             | 32  | 139.4496 | 142.7921  | 111.0907 |        107.1071        |
|      MBartForConditionalGeneration      |  2  | 140.8941 | 142.4079  |  96.989  |         93.755         |
|      BartForConditionalGeneration       |  2  | 139.8662 | 145.4267  | 96.0174  |        93.0348         |
|    MegatronBertForQuestionAnswering     |  8  | 144.5748 | 147.7447  | 88.5511  |        87.1011         |
|            YituTechConvBert             | 16  | 127.1969 | 131.0628  | 84.1723  |         84.059         |
| BlenderbotSmallForConditionalGeneration | 64  | 123.1284 | 120.0229  | 82.1051  |        79.4205         |
|     MobileBertForQuestionAnswering      | 128 | 197.5101 |  210.795  | 80.7856  |        158.9761        |
|             BartForCausalLM             |  4  | 115.3845 | 117.5538  | 76.7327  |        73.7206         |
|                CamemBert                | 16  | 119.9173 | 122.9388  | 76.6652  |        77.1996         |
|            MBartForCausalLM             |  4  | 115.0978 | 117.7117  | 76.2441  |        73.8808         |
|       DebertaForQuestionAnswering       |  8  | 100.7889 | 111.4285  | 75.1224  |        82.3676         |
|     M2M100ForConditionalGeneration      | 16  | 130.5334 | 138.8671  | 74.1111  |        80.1447         |
|     PLBartForConditionalGeneration      |  4  | 119.4576 | 122.5939  | 73.9723  |        71.1227         |
|          MobileBertForMaskedLM          | 64  | 202.4099 | 212.9687  | 71.9075  |        155.3609        |
|           LayoutLMForMaskedLM           | 16  | 114.064  |  117.233  |  71.839  |        69.9574         |
|     DistilBertForQuestionAnswering      | 256 | 103.8823 | 105.2415  |  71.543  |        71.5425         |
|           DebertaForMaskedLM            |  4  | 92.8871  | 109.0781  | 71.4704  |        83.8807         |
|            PLBartForCausalLM            |  8  | 114.5818 | 120.1304  | 71.2884  |        69.6941         |
|             OPTForCausalLM              |  2  | 170.0532 | 182.2686  | 70.2258  |        68.6106         |
|          DistilBertForMaskedLM          | 128 | 85.3564  |  89.1008  |  69.581  |        67.9135         |
|             BertForMaskedLM             | 16  | 111.6245 | 114.3253  | 68.9599  |        69.3136         |
|           RobertaForCausalLM            | 16  | 116.8752 | 119.5042  | 68.8678  |        69.3669         |
|                 T5Small                 |  4  | 107.2339 | 122.6403  | 64.3824  |        60.3799         |
|       T5ForConditionalGeneration        |  4  | 107.1442 |  122.213  | 64.3672  |        60.4042         |
|               DistillGPT2               | 16  | 107.1297 | 110.6664  | 64.1087  |        62.5326         |
|           PegasusForCausalLM            | 32  | 78.8585  |  77.0085  |  59.668  |        58.5378         |
|         MegatronBertForCausalLM         |  4  | 88.6933  |  96.8952  |  59.369  |        58.1087         |
|             XGLMForCausalLM             |  8  | 115.7094 | 124.8175  | 55.4191  |        97.2622         |
|    LayoutLMForSequenceClassification    | 16  | 99.1354  | 100.6177  | 54.9842  |        55.4453         |
|       ElectraForQuestionAnswering       | 64  | 116.0483 | 117.3467  | 54.2014  |        54.9442         |
|        BertForQuestionAnswering         | 16  | 96.8373  |  98.3396  | 53.9834  |         54.295         |
|       RobertaForQuestionAnswering       | 16  | 97.5463  |  98.3675  | 53.7999  |        54.8657         |
|           ElectraForCausalLM            | 32  | 89.6925  |  94.1506  | 49.0735  |        47.9946         |
|       BlenderbotSmallForCausalLM        | 64  | 64.3906  |  65.393   | 47.8424  |        48.0556         |
|       MT5ForConditionalGeneration       | 16  |  94.845  | 109.5203  | 44.1184  |        49.6966         |
|      GPT2ForSequenceClassification      |  4  | 93.7625  |  96.0557  | 40.8303  |        40.2886         |
|         Speech2Text2ForCausalLM         | 256 | 55.2096  |  57.0929  | 36.4491  |        36.3258         |
|          BlenderbotForCausalLM          |  4  | 115.7159 | 131.0463  |   nan    |        107.0027        |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9986 |  0.9961   |  3.0135  |         2.9754         |
|      xcit_large_24_p8_224       |  5  | 0.9902 |  0.8656   |  2.1839  |         1.5905         |
|         coat_lite_mini          | 128 | 0.9971 |  0.9954   |  1.9383  |         1.9143         |
|        twins_pcpvt_base         | 64  | 0.9969 |  0.9081   |  1.9377  |         1.6863         |
|          ghostnet_100           | 128 | 0.9925 |   0.76    |  1.8228  |         1.5971         |
|          gmlp_s16_224           | 128 | 0.9945 |  1.0812   |  1.7967  |         1.7861         |
|          gmixer_24_224          | 128 | 0.9951 |  0.8891   |  1.7356  |         1.7296         |
|            lcnet_050            | 128 | 0.9408 |  0.7362   |  1.6854  |         1.4495         |
|           volo_d1_224           | 64  | 0.9943 |  0.9733   |  1.6833  |         1.6598         |
|         crossvit_9_240          | 128 | 0.9904 |   0.783   |  1.6237  |         1.5983         |
|  swin_base_patch4_window7_224   | 64  | 0.9912 |  0.9564   |  1.6116  |         1.6015         |
|           convit_base           | 64  | 0.9984 |  0.9979   |  1.5506  |         1.5482         |
|             dla102              | 128 | 0.9955 |  0.8155   |  1.5259  |         1.5239         |
|       gluon_inception_v3        | 128 | 0.9967 |  0.8648   |  1.5094  |         1.5001         |
|        adv_inception_v3         | 128 | 0.9963 |  0.8605   |  1.5094  |         1.4975         |
|          inception_v3           | 128 | 0.9961 |  0.8643   |  1.5086  |         1.4988         |
|        sebotnet33ts_256         | 64  | 0.9576 |  0.7653   |  1.4976  |         1.525          |
|            nfnet_l0             | 128 | 0.9895 |   0.814   |  1.4904  |         1.4327         |
|          convnext_base          | 64  | 0.9838 |  0.9845   |  1.4843  |         1.4699         |
|           dm_nfnet_f0           | 128 | 0.9867 |  0.9855   |  1.4586  |         1.4101         |
|           mnasnet_100           | 128 | 0.9472 |  0.7409   |  1.4296  |         1.4841         |
|            pit_b_224            | 64  | 0.9945 |  0.9925   |  1.4284  |         1.424          |
|       eca_botnext26ts_256       | 128 | 0.9737 |  0.7192   |  1.4257  |         1.4086         |
|      mobilenetv3_large_100      | 128 | 0.9493 |  0.7603   |  1.4223  |         1.4231         |
|           mobilevit_s           | 64  | 0.9618 |  0.7314   |  1.4185  |         1.4355         |
|           resnest101e           | 64  | 0.9945 |  0.8681   |  1.4113  |         1.341          |
|           selecsls42b           | 128 | 0.9984 |  0.8126   |  1.4064  |         1.4066         |
|          botnet26t_256          | 128 | 0.9731 |   0.851   |  1.3906  |         1.4055         |
|         mobilenetv2_100         | 128 | 0.9493 |  0.7376   |  1.3776  |         1.4316         |
|        res2net50_14w_8s         | 128 | 0.9989 |  0.7905   |  1.3775  |         1.3526         |
|           regnety_002           | 128 | 0.9503 |  0.7099   |  1.3769  |         1.2172         |
|           res2next50            | 128 | 0.9989 |  0.8252   |  1.3677  |         1.3606         |
|          jx_nest_base           | 32  | 0.987  |  0.9853   |  1.3671  |         1.3608         |
|          mixer_b16_224          | 128 | 0.9976 |  1.0185   |  1.3624  |         1.3599         |
|       tf_efficientnet_b0        | 128 | 0.9602 |  0.6813   |  1.3542  |         1.3864         |
|          spnasnet_100           | 128 | 0.9417 |   0.739   |  1.3507  |         1.4085         |
|            hrnet_w18            | 128 | 0.9927 |  0.6449   |  1.3481  |         1.3444         |
|          cait_m36_384           |  4  | 0.9966 |  0.9935   |  1.3473  |         1.3508         |
|           fbnetc_100            | 128 | 0.9498 |  0.7395   |  1.3449  |         1.3955         |
|      beit_base_patch16_224      | 64  | 0.9963 |  0.9674   |  1.3429  |         1.3405         |
|        ese_vovnet19b_dw         | 128 | 0.9569 |  0.8324   |  1.3376  |         1.3565         |
|         poolformer_m36          | 64  | 0.9865 |  0.9829   |  1.3284  |         1.3178         |
|            fbnetv3_b            | 128 | 0.9495 |  0.7695   |  1.3005  |         1.316          |
|           rexnet_100            | 128 | 0.952  |  0.7035   |  1.2892  |         1.3246         |
| deit_base_distilled_patch16_224 | 64  | 0.9963 |  0.9937   |  1.2513  |         1.2507         |
|          resmlp_12_224          | 128 | 0.9929 |  0.8894   |  1.2503  |         1.2478         |
|      vit_base_patch16_224       | 64  | 0.9962 |  0.9935   |  1.2333  |         1.2321         |
|            tinynet_a            | 128 | 0.9471 |  0.6788   |  1.2235  |         1.2542         |
|          cspdarknet53           | 64  | 0.9334 |  0.7862   |  1.2081  |         1.2442         |
|           tf_mixnet_l           | 128 | 0.9763 |  0.8276   |  1.1802  |         1.1883         |
|         visformer_small         | 128 | 0.9955 |  0.9443   |  1.1725  |         1.1657         |
|            mixnet_l             | 128 | 0.9762 |  0.8211   |  1.1704  |         1.1781         |
|        res2net101_26w_4s        | 64  | 0.9992 |  0.7974   |  1.149   |         1.0603         |
|        gluon_xception65         | 32  | 0.9924 |  0.8429   |  1.0715  |         1.0752         |
|             dpn107              | 32  | 0.9325 |  0.8077   |  1.0696  |         1.1365         |
|            repvgg_a2            | 128 | 0.936  |   0.754   |  1.0658  |         1.0978         |
|     swsl_resnext101_32x16d      | 32  | 0.9979 |  0.8423   |  1.0616  |         1.0213         |
|            gernet_l             | 128 | 0.9362 |  0.7933   |  1.0342  |         1.0616         |
|        convmixer_768_32         | 32  | 0.9984 |  0.9647   |  1.0018  |         1.0024         |
|          pnasnet5large          | 16  | 0.9863 |  0.9152   |  0.9125  |         0.912          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |      fail_to_run       |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.6499  |  11.1255  | 288.7461 |        283.0597        |
|          ghostnet_100           | 128 |  7.496  |  14.8731  | 238.3372 |        236.8188        |
|            hrnet_w18            | 128 | 9.5317  |  35.6234  | 228.0449 |        230.6652        |
|            fbnetv3_b            | 128 |  8.594  |  17.0237  | 168.1243 |        167.7835        |
|      mobilenetv3_large_100      | 128 | 4.2566  |  8.3754   | 157.9431 |        157.3151        |
|            tinynet_a            | 128 | 5.9253  |  12.1456  | 157.5898 |        156.0813        |
|           tf_mixnet_l           | 128 | 8.9654  |  16.7401  | 155.888  |        150.0548        |
|           resnest101e           | 64  | 11.1749 |  23.9704  | 155.071  |        152.5135        |
|           mobilevit_s           | 64  | 5.3928  |  11.3083  | 154.3054 |        152.3807        |
|        adv_inception_v3         | 128 | 5.5769  |  12.3598  | 153.7477 |        154.5061        |
|       gluon_inception_v3        | 128 |  5.586  |  12.4072  | 153.6197 |        147.0342        |
|       tf_efficientnet_b0        | 128 |   5.1   |  10.4178  | 153.018  |        150.6381        |
|          inception_v3           | 128 | 5.6977  |  12.1899  | 151.8774 |        154.8908        |
|          pnasnet5large          | 16  | 8.3175  |  25.8719  | 151.159  |        147.4219        |
|            mixnet_l             | 128 | 8.5524  |  15.9353  | 151.0383 |        152.4029        |
|        twins_pcpvt_base         | 64  | 10.5482 |  23.0057  | 140.563  |        140.4003        |
|        res2net101_26w_4s        | 64  | 10.5602 |  24.3203  | 140.5159 |        137.4283        |
|          spnasnet_100           | 128 |  5.008  |  9.3919   | 137.6578 |        134.1523        |
|           fbnetc_100            | 128 | 5.0706  |  9.4886   | 135.7825 |        135.3708        |
|         mobilenetv2_100         | 128 | 4.0434  |  7.7698   | 127.1539 |        127.6348        |
|           mnasnet_100           | 128 | 4.0174  |  7.5412   | 123.0879 |        117.5419        |
|      xcit_large_24_p8_224       |  5  | 12.6082 |  28.1718  | 121.0067 |        118.8733        |
|        res2net50_14w_8s         | 128 | 8.9592  |  22.2819  | 114.4306 |        114.3314        |
|        sebotnet33ts_256         | 64  | 4.1707  |  8.8264   | 106.0411 |        102.9861        |
|          cait_m36_384           |  4  | 14.366  |  30.2783  | 105.8367 |        104.4001        |
|           regnety_002           | 128 | 4.9403  |  8.7312   | 105.6131 |        103.6738        |
|  swin_base_patch4_window7_224   | 64  | 8.5956  |  19.2661  | 101.7293 |        100.9343        |
|          cspdarknet53           | 64  | 5.7395  |  10.8441  | 96.6939  |        97.4574         |
|       eca_botnext26ts_256       | 128 | 3.0513  |  6.7901   | 96.0962  |        95.1214         |
|            lcnet_050            | 128 | 2.5148  |  4.9971   | 95.9515  |        96.0764         |
|         poolformer_m36          | 64  | 7.5149  |  13.8955  | 94.3322  |        92.1959         |
|             dpn107              | 32  | 10.093  |  19.6097  |  92.088  |        92.6514         |
|             dla102              | 128 | 6.2022  |  13.9845  | 91.7288  |        91.8695         |
|           selecsls42b           | 128 | 2.4935  |  5.3793   | 89.3998  |        89.6227         |
|        gluon_xception65         | 32  | 7.7563  |  16.6162  | 88.1598  |        87.3761         |
|          botnet26t_256          | 128 | 2.9116  |  5.9539   | 87.7239  |        86.6695         |
|         coat_lite_mini          | 128 |  3.271  |   7.961   | 85.1701  |        85.7562         |
|         crossvit_9_240          | 128 | 5.7423  |  13.2492  | 83.6992  |        82.1117         |
|           res2next50            | 128 | 5.0262  |  12.0765  | 83.1771  |        82.8639         |
|          jx_nest_base           | 32  | 6.6525  |  14.6085  | 80.1855  |        79.7285         |
|            gernet_l             | 128 | 4.9343  |  8.7777   | 78.2909  |        78.7421         |
|            nfnet_l0             | 128 | 5.2847  |  10.7176  | 76.3518  |         74.099         |
|        ese_vovnet19b_dw         | 128 | 2.5796  |   4.631   | 74.8328  |        76.0618         |
|           volo_d1_224           | 64  | 5.0699  |  11.4415  | 69.8444  |        70.3126         |
|           dm_nfnet_f0           | 128 | 6.1147  |  11.5906  | 69.3723  |         69.114         |
|        tnt_s_patch16_224        | 128 | 6.4274  |  15.9429  | 64.3484  |        61.1423         |
|         visformer_small         | 128 | 2.6493  |  6.0078   | 63.0181  |        63.9283         |
|            repvgg_a2            | 128 | 4.8854  |  8.6902   | 57.3163  |        58.2081         |
|     swsl_resnext101_32x16d      | 32  | 5.9774  |  13.2322  | 57.0045  |        54.8387         |
|          gmlp_s16_224           | 128 | 5.7283  |  11.7609  | 55.5176  |        54.0315         |
|          convnext_base          | 64  | 6.9992  |  12.6659  | 54.1563  |        54.0886         |
|          gmixer_24_224          | 128 | 5.8111  |  12.742   | 47.9836  |         46.609         |
|           convit_base           | 64  | 3.4134  |  8.6173   |  43.044  |        44.4825         |
|            pit_b_224            | 64  | 3.5359  |  8.0982   | 42.5117  |        42.6667         |
| deit_base_distilled_patch16_224 | 64  |  3.087  |  7.1644   | 40.0239  |        38.5876         |
|          resmlp_12_224          | 128 | 2.8505  |  5.3199   | 38.5371  |        39.1618         |
|      vit_base_patch16_224       | 64  | 3.0472  |  6.9648   |  37.005  |        37.5653         |
|        convmixer_768_32         | 32  | 1.6481  |  6.7749   |  34.002  |        32.0207         |
|      beit_base_patch16_224      | 64  | 3.8713  |  8.6735   | 33.2354  |        32.1855         |
|          mixer_b16_224          | 128 | 2.6705  |  5.7681   | 31.6771  |        30.7369         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9235   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9635 |  0.9155   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9314   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 301.2043 | 311.1854  | 300.479  |        300.5658        |
|          pnasnet5large          | 16  | 199.6743 | 214.5498  | 216.6162 |        216.375         |
|            hrnet_w18            | 128 | 282.4678 | 434.3689  | 207.0581 |        208.5376        |
|           tf_mixnet_l           | 128 | 194.6588 | 229.5104  | 160.8235 |        159.5201        |
|            mixnet_l             | 128 | 186.2702 | 220.8583  | 155.361  |        153.9765        |
|          cait_m36_384           |  4  | 170.5414 | 167.9076  | 124.3364 |        128.3065        |
|           resnest101e           | 64  | 165.5306 | 188.5367  | 116.1084 |        122.8297        |
|             dla102              | 128 | 172.7069 | 210.7192  | 112.6702 |        112.9139        |
|     swsl_resnext101_32x16d      | 32  | 118.7021 | 140.6289  | 112.0291 |        116.4171        |
|         poolformer_m36          | 64  | 146.8999 | 147.5658  | 109.3138 |        109.892         |
|        tnt_s_patch16_224        | 128 |  324.2   | 324.8543  | 107.3832 |        108.7784        |
|       gluon_inception_v3        | 128 | 161.2602 | 185.7625  | 106.3367 |        106.8955        |
|        adv_inception_v3         | 128 | 160.9743 | 186.4505  | 106.3079 |        107.1292        |
|          inception_v3           | 128 | 160.8737 |  185.446  | 106.2676 |        106.8831        |
|           convit_base           | 64  | 163.4157 | 163.4611  | 105.1006 |        105.4983        |
|        res2net50_14w_8s         | 128 | 141.2562 | 178.3308  | 102.2634 |        104.274         |
|             dpn107              | 32  | 114.2828 | 131.2709  | 99.5075  |        93.4166         |
|        gluon_xception65         | 32  | 99.8825  | 117.3345  | 92.6371  |        92.0568         |
|           res2next50            | 128 | 126.2555 | 152.9449  | 92.0879  |        92.7021         |
|  swin_base_patch4_window7_224   | 64  | 147.8371 | 153.0618  | 90.9465  |        91.2324         |
|           dm_nfnet_f0           | 128 | 128.7899 | 128.9855  | 87.0811  |        89.9482         |
|        res2net101_26w_4s        | 64  | 99.6164  | 123.6776  | 85.3967  |        93.3205         |
|          mixer_b16_224          | 128 | 116.542  | 114.2455  | 85.3449  |        85.6638         |
|            fbnetv3_b            | 128 | 115.7497 | 142.7443  | 84.4382  |        83.4564         |
|            pit_b_224            | 64  | 119.0433 | 119.2006  | 82.7445  |        82.9009         |
|          convnext_base          | 64  | 124.4491 | 124.1528  | 82.6219  |        83.3751         |
|         visformer_small         | 128 | 91.4619  |  96.424   | 77.6357  |        78.0176         |
|          gmlp_s16_224           | 128 | 138.0283 | 126.8227  | 76.3994  |        76.6156         |
|      beit_base_patch16_224      | 64  | 101.7619 | 104.5845  | 75.5454  |        75.4227         |
|            nfnet_l0             | 128 | 113.4953 | 137.5498  | 75.5426  |         78.273         |
|       eca_botnext26ts_256       | 128 | 108.7715 | 147.4409  |  74.392  |        75.2238         |
|          cspdarknet53           | 64  | 95.0022  | 112.6821  | 73.3428  |        71.2424         |
|          jx_nest_base           | 32  | 101.8208 | 101.4019  | 73.2647  |        73.6947         |
|           volo_d1_224           | 64  | 121.3182 | 123.8285  | 71.5564  |        72.6558         |
|          botnet26t_256          | 128 | 101.9791 | 116.6774  | 71.4204  |        70.5942         |
|            gernet_l             | 128 | 77.7402  |  91.7117  | 70.4839  |        68.6163         |
|      vit_base_patch16_224       | 64  | 87.2127  |  87.2328  | 70.3422  |        70.4022         |
|            repvgg_a2            | 128 | 77.7713  |  96.4286  | 68.2225  |         66.236         |
|          gmixer_24_224          | 128 | 118.3205 | 132.0526  |  68.057  |        67.9512         |
| deit_base_distilled_patch16_224 | 64  | 85.0489  |  85.0043  | 67.7683  |         67.645         |
|      xcit_large_24_p8_224       |  5  |  123.43  | 145.8742  | 63.0031  |        89.9218         |
|        twins_pcpvt_base         | 64  | 119.3104 | 125.0261  |  60.505  |        69.9482         |
|       tf_efficientnet_b0        | 128 | 84.9416  | 120.1079  | 60.1904  |        58.8671         |
|           rexnet_100            | 128 | 80.4316  | 108.5566  | 59.1279  |        57.5094         |
|           fbnetc_100            | 128 | 83.0608  | 106.6654  | 58.5822  |        56.4735         |
|         coat_lite_mini          | 128 | 113.3022 | 113.5964  | 58.2712  |        59.0135         |
|           mobilevit_s           | 64  | 84.6557  | 111.4398  | 57.4528  |        56.6755         |
|            tinynet_a            | 128 | 73.5904  | 102.8796  | 57.1395  |        55.5772         |
|        sebotnet33ts_256         | 64  |  80.489  | 100.5839  | 51.4151  |        50.5495         |
|         crossvit_9_240          | 128 | 82.5027  | 104.4223  | 50.3446  |        51.2156         |
|          ghostnet_100           | 128 | 90.8675  | 118.7744  | 49.4999  |         56.464         |
|          spnasnet_100           | 128 |  70.644  |  89.7409  | 49.1978  |        47.1594         |
|        ese_vovnet19b_dw         | 128 | 64.8055  |  74.5965  | 46.3816  |        45.6907         |
|         mobilenetv2_100         | 128 | 65.4937  |  84.5333  |  45.239  |         43.554         |
|           mnasnet_100           | 128 | 64.3554  |  82.3144  | 42.7103  |        41.1556         |
|           selecsls42b           | 128 | 60.1011  |  73.8341  | 42.6628  |        42.6379         |
|          resmlp_12_224          | 128 | 53.5898  |  59.6815  | 42.5161  |        42.6724         |
|      mobilenetv3_large_100      | 128 | 61.5346  |  76.8817  |  41.026  |        40.9484         |
|           regnety_002           | 128 | 43.0843  |  52.3468  |  26.639  |        30.2997         |
|            lcnet_050            | 128 | 31.8731  |  40.708   | 17.7607  |        20.6234         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_085_26_03_23_performance_amp_382

Commit hashes

pytorch commit: 542fb0b
pytorch commit date: 2023-03-26 20:03:25+00:00
torchbench commit: 575b6b9932aae3afddc4e0acb1487c8d8201a328
torchbench commit date: 2023-03-26 10:37:27-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git542fb0b

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 87%, 52/60 | 93%, 42/45  | 98%, 59/60  |
| inductor_no_cudagraphs | 87%, 52/60 | 98%, 44/45  | 98%, 59/60  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.56x    |    1.59x    |    1.40x    |
| inductor_no_cudagraphs |   1.27x    |    1.49x    |    1.38x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.81    |    7.60     |    5.92     |
|       aot_eager        |    9.43    |    16.08    |    13.20    |
|        inductor        |   61.34    |    59.05    |   105.42    |
| inductor_no_cudagraphs |   59.67    |    55.01    |   104.80    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.79x    |    0.89x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.03x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Previous report name: /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Passrate diff

+------------------------+-------------+-------------+------------+
|        compiler        |    suite    | prev_value  | cur_value  |
+------------------------+-------------+-------------+------------+
|        inductor        | torchbench  | 87%, 52/60  | 87%, 52/60 |
|        inductor        | huggingface | 93%, 42/45  | 93%, 42/45 |
|        inductor        | timm_models | 100%, 60/60 | 98%, 59/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60 |
| inductor_no_cudagraphs | huggingface | 98%, 44/45  | 98%, 44/45 |
| inductor_no_cudagraphs | timm_models | 98%, 59/60  | 98%, 59/60 |
+------------------------+-------------+-------------+------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.58x    |   1.56x   |
|        inductor        | huggingface |   1.61x    |   1.59x   |
|        inductor        | timm_models |   1.40x    |   1.40x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.48x    |   1.49x   |
| inductor_no_cudagraphs | timm_models |   1.38x    |   1.38x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+---------------------------------+------------------------+-----------------+
|    suite    |              name               | inductor_no_cudagraphs |    inductor     |
+-------------+---------------------------------+------------------------+-----------------+
| torchbench  |              moco               |      fail_to_run       |   fail_to_run   |
| torchbench  |       Background_Matting        |    eager_variation     | eager_variation |
| torchbench  |         vision_maskrcnn         |    eager_variation     | eager_variation |
| torchbench  |            tacotron2            |         0.0000         |     0.0000      |
| torchbench  |               gat               |         0.0000         |     0.0000      |
| torchbench  |               gcn               |         0.0000         |     0.0000      |
| torchbench  |              llama              |         0.0000         |     0.0000      |
| torchbench  |              sage               |         0.0000         |     0.0000      |
| torchbench  |          torchrec_dlrm          |         0.0000         |     0.0000      |
| huggingface |  DebertaV2ForQuestionAnswering  |          pass          |   fail_to_run   |
| huggingface |   AlbertForQuestionAnswering    |     fail_accuracy      |  fail_accuracy  |
| timm_models | deit_base_distilled_patch16_224 |      fail_to_run       |   fail_to_run   |
+-------------+---------------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+---------------------------------+------------------------+----------+
|    suite    |              name               | inductor_no_cudagraphs | inductor |
+-------------+---------------------------------+------------------------+----------+
| torchbench  |              dcgan              |         0.8328         |  1.4386  |
| torchbench  |          lennard_jones          |         0.8583         |  1.378   |
| torchbench  |        soft_actor_critic        |         0.8351         |  1.061   |
| torchbench  |           timm_vovnet           |         0.9232         |  0.9357  |
| torchbench  |     nvidia_deeprecommender      |         1.0182         |  0.8719  |
| torchbench  |  timm_vision_transformer_large  |          0.0           |   0.0    |
| torchbench  |              moco               |          0.0           |   0.0    |
| torchbench  |               gat               |          0.0           |   0.0    |
| torchbench  |               gcn               |          0.0           |   0.0    |
| torchbench  |              sage               |          0.0           |   0.0    |
| torchbench  |            tacotron2            |          0.0           |   0.0    |
| torchbench  |          torchrec_dlrm          |          0.0           |   0.0    |
| huggingface |   DebertaForQuestionAnswering   |         0.9168         |  1.0211  |
| huggingface |       DebertaForMaskedLM        |         0.7761         |  0.9137  |
| huggingface |      DebertaV2ForMaskedLM       |         0.6158         |  0.8332  |
| huggingface |  DebertaV2ForQuestionAnswering  |         0.6277         |  0.771   |
| huggingface |      BlenderbotForCausalLM      |         1.0945         |   0.0    |
| timm_models |          pnasnet5large          |         0.9202         |  0.9087  |
| timm_models | deit_base_distilled_patch16_224 |          0.0           |   0.0    |
+-------------+---------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |        phlippe_densenet        |        163.5295        | 166.2585 |
| torchbench  |          hf_T5_large           |        156.1887        | 159.6113 |
| torchbench  |       timm_efficientnet        |        140.0126        | 142.5271 |
| torchbench  |         hf_Longformer          |        109.9983        | 141.7021 |
| torchbench  |           hf_BigBird           |        115.4067        | 136.1409 |
| torchbench  |       mobilenet_v3_large       |        135.3169        | 133.901  |
| torchbench  |          densenet121           |        130.4886        | 128.3329 |
| torchbench  |          mobilenet_v2          |        125.6978        | 125.535  |
| huggingface |     AllenaiLongformerBase      |        107.9971        | 141.4432 |
| huggingface |     MobileBertForMaskedLM      |        134.3151        |  134.71  |
| huggingface | MobileBertForQuestionAnswering |        127.7439        | 128.4399 |
| huggingface |  MT5ForConditionalGeneration   |        126.8212        | 127.874  |
| huggingface |      DebertaV2ForMaskedLM      |        60.7726         | 125.7764 |
| huggingface | DebertaV2ForQuestionAnswering  |        58.5209         | 123.482  |
| timm_models |           rexnet_100           |        275.3279        | 278.5384 |
| timm_models |          ghostnet_100          |        231.4849        | 237.8118 |
| timm_models |           hrnet_w18            |        228.5473        | 227.9758 |
| timm_models |           fbnetv3_b            |        167.8978        | 164.147  |
| timm_models |          mobilevit_s           |        158.7458        | 157.9263 |
| timm_models |          resnest101e           |        156.0727        | 157.1213 |
| timm_models |          tf_mixnet_l           |        154.467         | 157.0361 |
| timm_models |           tinynet_a            |        155.5618        | 156.7248 |
| timm_models |       tf_efficientnet_b0       |        152.3895        | 155.1317 |
| timm_models |          inception_v3          |        154.8467        | 153.0311 |
| timm_models |        adv_inception_v3        |        155.2662        | 152.9945 |
| timm_models |       gluon_inception_v3       |        151.7647        | 152.9558 |
| timm_models |            mixnet_l            |        150.7686        | 152.0509 |
| timm_models |         pnasnet5large          |        147.9882        | 151.8038 |
| timm_models |     mobilenetv3_large_100      |        157.9578        | 151.3115 |
| timm_models |       res2net101_26w_4s        |        140.7468        | 141.2599 |
| timm_models |        twins_pcpvt_base        |        138.1102        | 138.3236 |
| timm_models |           fbnetc_100           |        133.9571        | 134.6528 |
| timm_models |          spnasnet_100          |        133.5134        | 132.9417 |
| timm_models |        mobilenetv2_100         |        121.6193        | 127.7707 |
| timm_models |      xcit_large_24_p8_224      |        120.2327        | 121.2818 |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |                 yolov3                  |         1.0367         |  0.8931  |
| torchbench  |              hf_GPT2_large              |         1.1284         |  0.8906  |
| torchbench  |            timm_efficientnet            |         0.9414         |  0.8699  |
| torchbench  |           speech_transformer            |         0.869          |  0.8651  |
| torchbench  |              timm_resnest               |         0.9516         |  0.8621  |
| torchbench  |           shufflenet_v2_x1_0            |         0.9649         |  0.8598  |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8593  |
| torchbench  |               timm_regnet               |         0.9533         |  0.8512  |
| torchbench  |                resnet152                |         0.9409         |  0.8498  |
| torchbench  |           Background_Matting            |         1.0403         |  0.8484  |
| torchbench  |              hf_DistilBert              |         0.9479         |  0.8476  |
| torchbench  |               hf_T5_large               |         1.168          |  0.8201  |
| torchbench  |              pytorch_unet               |         0.9308         |  0.8134  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8058  |
| torchbench  |                resnet50                 |         0.8852         |  0.7829  |
| torchbench  |                  dcgan                  |         0.9645         |  0.7821  |
| torchbench  |                 demucs                  |         0.9662         |  0.7733  |
| torchbench  |              squeezenet1_1              |         0.9087         |  0.7733  |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.7715  |
| torchbench  |                 hf_Bart                 |         0.9285         |  0.7535  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.7529  |
| torchbench  |               mnasnet1_0                |         0.7749         |  0.7434  |
| torchbench  |           mobilenet_v3_large            |         0.8723         |  0.728   |
| torchbench  |             pytorch_struct              |         0.7358         |  0.7274  |
| torchbench  |                  vgg16                  |         0.9805         |  0.7227  |
| torchbench  |                 alexnet                 |         0.9385         |  0.7088  |
| torchbench  |               densenet121               |         0.8034         |  0.7085  |
| torchbench  |               hf_BigBird                |         1.1068         |  0.6971  |
| torchbench  |             resnext50_32x4d             |         0.7718         |  0.6655  |
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.6585  |
| torchbench  |                   drq                   |         0.9573         |  0.6379  |
| torchbench  |            soft_actor_critic            |         0.9973         |  0.6066  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.5925  |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6004         |  0.5904  |
| torchbench  |                resnet18                 |         0.6127         |  0.5423  |
| torchbench  |              lennard_jones              |         0.9997         |  0.5317  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.4538  |
| torchbench  |              hf_Longformer              |         0.8947         |  0.417   |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.3991  |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3169  |
| huggingface |            PLBartForCausalLM            |         0.9249         |  0.8907  |
| huggingface |     PegasusForConditionalGeneration     |         1.0074         |  0.8901  |
| huggingface |           ElectraForCausalLM            |         0.8941         |  0.889   |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8849  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8729  |
| huggingface |      MBartForConditionalGeneration      |         1.0307         |  0.8672  |
| huggingface |            TrOCRForCausalLM             |         0.9075         |  0.8619  |
| huggingface |            MBartForCausalLM             |         0.9507         |  0.8491  |
| huggingface |      BartForConditionalGeneration       |         1.0139         |  0.8456  |
| huggingface |         MegatronBertForCausalLM         |         1.0962         |  0.845   |
| huggingface |             BartForCausalLM             |         0.943          |  0.8301  |
| huggingface |       BlenderbotSmallForCausalLM        |         0.8318         |  0.8065  |
| huggingface |           PegasusForCausalLM            |         0.9252         |  0.7952  |
| huggingface |         Speech2Text2ForCausalLM         |         0.808          |  0.7566  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.7473  |
| huggingface |             XGLMForCausalLM             |         0.9287         |  0.6744  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6569  |
| huggingface |     M2M100ForConditionalGeneration      |         0.8978         |  0.6058  |
| huggingface |           DebertaForMaskedLM            |         0.9978         |  0.5501  |
| huggingface |          DebertaV2ForMaskedLM           |         0.9665         |  0.5197  |
| huggingface |      DebertaV2ForQuestionAnswering      |         0.9802         |  0.487   |
| huggingface |          AllenaiLongformerBase          |         0.8742         |  0.4688  |
| huggingface |       DebertaForQuestionAnswering       |         1.1527         |  0.4601  |
| timm_models |                hrnet_w18                |          0.99          |  0.8918  |
| timm_models |            sebotnet33ts_256             |         1.1115         |  0.891   |
| timm_models |           gluon_inception_v3            |         1.0171         |  0.8904  |
| timm_models |            adv_inception_v3             |         1.0171         |  0.8904  |
| timm_models |              inception_v3               |         1.0171         |  0.8904  |
| timm_models |                 dpn107                  |         0.9642         |  0.8833  |
| timm_models |            gluon_xception65             |         0.9705         |  0.8831  |
| timm_models |              ghostnet_100               |         0.977          |  0.8807  |
| timm_models |              spnasnet_100               |         0.9451         |  0.8786  |
| timm_models |          mobilenetv3_large_100          |         0.9361         |  0.877   |
| timm_models |             poolformer_m36              |         1.1871         |  0.8768  |
| timm_models |           eca_botnext26ts_256           |         1.0072         |  0.8738  |
| timm_models |          xcit_large_24_p8_224           |         0.9732         |  0.8721  |
| timm_models |            res2net50_14w_8s             |         0.9607         |  0.8712  |
| timm_models |            res2net101_26w_4s            |         0.9483         |  0.871   |
| timm_models |                mixnet_l                 |         0.9902         |  0.8687  |
| timm_models |               mnasnet_100               |         0.9403         |  0.8683  |
| timm_models |               res2next50                |         0.9547         |  0.866   |
| timm_models |              cait_m36_384               |         0.989          |  0.8632  |
| timm_models |               fbnetc_100                |         0.9535         |  0.8596  |
| timm_models |                pit_b_224                |         1.0242         |  0.8578  |
| timm_models |               selecsls42b               |         0.9664         |  0.8576  |
| timm_models |              convnext_base              |         1.0338         |  0.8505  |
| timm_models |                gernet_l                 |         0.9706         |  0.8499  |
| timm_models |         swsl_resnext101_32x16d          |         0.9786         |  0.8461  |
| timm_models |             coat_lite_mini              |         1.0202         |  0.8402  |
| timm_models |              botnet26t_256              |         0.9779         |  0.8239  |
| timm_models |                lcnet_050                |         0.884          |  0.805   |
| timm_models |                repvgg_a2                |         0.9611         |  0.7738  |
| timm_models |               regnety_002               |         0.8966         |  0.7602  |
| timm_models |             crossvit_9_240              |         0.9898         |  0.7526  |
| timm_models |      swin_base_patch4_window7_224       |         0.9045         |  0.7214  |
| timm_models |              jx_nest_base               |         0.9604         |  0.6693  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/passrate_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/geomean_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Peak Memory Compression Ratio regressions

+----------+-------------------+-------------+------------+
| compiler |       name        | prev_status | cur_status |
+----------+-------------------+-------------+------------+
| inductor | timm_efficientnet |   0.9287    |   0.8699   |
+----------+-------------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382

Accuracy regressions

+----------+---------------------------------+-------------+-------------+
| compiler |              name               | prev_status | cur_status  |
+----------+---------------------------------+-------------+-------------+
| inductor | deit_base_distilled_patch16_224 |    pass     | fail_to_run |
+----------+---------------------------------+-------------+-------------+

Performance speedup regressions

+------------------------+---------------------------------+-------------+------------+
|        compiler        |              name               | prev_status | cur_status |
+------------------------+---------------------------------+-------------+------------+
| inductor_no_cudagraphs | deit_base_distilled_patch16_224 |   1.2507    |    0.0     |
|        inductor        | deit_base_distilled_patch16_224 |   1.2513    |    0.0     |
+------------------------+---------------------------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+----------------------+-------------+------------+
|        compiler        |         name         | prev_status | cur_status |
+------------------------+----------------------+-------------+------------+
| inductor_no_cudagraphs | xcit_large_24_p8_224 |  118.8733   |  120.2327  |
+------------------------+----------------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9655 |  0.9076   |  3.524   |         1.3568         |
|           BERT_pytorch            |  16  | 0.9937 |  0.8049   |  2.9165  |         2.0904         |
|            densenet121            |  4   | 0.9884 |  0.7121   |  2.8071  |         1.0454         |
|            hf_BigBird             |  2   | 0.952  |  0.7764   |  2.5105  |         1.6179         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9728 |  0.8978   |  2.3677  |         1.7949         |
|             hf_Albert             |  8   | 0.995  |  0.9558   |  2.2873  |         2.2494         |
|            hf_T5_large            |  2   | 0.9745 |  0.8064   |  2.2215  |         1.8667         |
|        mobilenet_v3_large         |  32  | 0.9957 |  0.7851   |  2.0491  |         1.1901         |
|         phlippe_densenet          | 128  | 0.9844 |  0.7726   |  2.0377  |         1.0008         |
|               dlrm                | 1024 | 0.9381 |  0.8516   |  2.0223  |         1.2036         |
|           squeezenet1_1           |  32  | 0.9815 |  0.9339   |  1.9268  |         1.3102         |
|               hf_T5               |  8   | 0.985  |  0.8485   |  1.8944  |         1.9428         |
|          phlippe_resnet           | 128  | 0.9808 |  0.7597   |  1.8797  |         1.0003         |
|              hf_Bart              |  4   | 0.971  |   0.772   |  1.8334  |         1.3502         |
|              hf_Bert              |  4   | 0.996  |  0.8457   |  1.7554  |         1.5938         |
|              hf_GPT2              |  4   | 0.9921 |  0.9581   |  1.7492  |         1.7702         |
|          resnext50_32x4d          |  8   | 0.9846 |  0.7389   |  1.6912  |         0.9672         |
|            mnasnet1_0             |  32  | 0.9892 |  0.7393   |  1.6827  |         1.0998         |
|        shufflenet_v2_x1_0         | 128  | 0.9934 |  0.7595   |  1.6574  |         1.1932         |
|           hf_GPT2_large           |  4   | 0.983  |  0.9719   |  1.6567  |         1.7165         |
|        speech_transformer         |  32  | 0.9774 |  0.8159   |  1.6177  |         1.585          |
|             resnet18              |  16  | 0.985  |  0.7629   |  1.5775  |         0.9831         |
|           hf_Bert_large           |  4   | 0.9966 |  0.8735   |  1.5768  |         1.5616         |
|           timm_resnest            |  32  | 0.9934 |  0.8491   |  1.5616  |         1.503          |
|           fastNLP_Bert            |  6   | 0.9912 |  0.8012   |  1.5309  |         1.4959         |
|      timm_vision_transformer      |  32  | 0.986  |  0.8686   |  1.5266  |         1.3728         |
|            timm_nfnet             | 128  | 0.986  |  0.9835   |  1.509   |         1.4508         |
|           mobilenet_v2            |  96  | 0.9968 |  0.7772   |  1.5089  |         1.5077         |
| attention_is_all_you_need_pytorch | 256  | 0.9878 |  0.9197   |  1.4479  |         1.4331         |
|          pytorch_struct           | 200  | 0.9278 |  0.7634   |  1.4453  |         1.0902         |
|               dcgan               |  32  | 0.8649 |  0.6902   |  1.4386  |         0.8328         |
|                drq                |  1   | 0.9711 |  0.7544   |  1.4325  |         0.9592         |
|           hf_Longformer           |  2   | 0.8259 |  0.5652   |  1.4312  |         1.1815         |
|           hf_DistilBert           |  8   | 0.9826 |  0.9564   |  1.4301  |         1.4593         |
|         timm_efficientnet         |  32  | 0.9388 |  0.6253   |  1.4104  |         1.0759         |
|           lennard_jones           | 1000 | 0.8274 |  0.7407   |  1.378   |         0.8583         |
|           pytorch_unet            |  1   | 0.9966 |  0.2052   |  1.3732  |         1.3663         |
|          LearningToPaint          |  96  | 0.9859 |  0.7718   |  1.317   |         1.0634         |
|          pytorch_stargan          |  16  | 0.9935 |  0.8068   |  1.2765  |         1.2415         |
|               vgg16               |  64  | 0.9993 |  0.9987   |  1.2403  |         1.2523         |
|            Super_SloMo            |  6   | 0.9967 |  0.1792   |  1.2326  |         1.2318         |
|        Background_Matting         |  4   | 0.9988 |   0.137   |  1.2117  |         1.2052         |
|             resnet152             |  32  | 0.9951 |  0.7628   |  1.2103  |         1.027          |
|              yolov3               |  16  | 0.9964 |  0.8059   |  1.1872  |          1.19          |
|             resnet50              |  32  | 0.9939 |  0.7748   |  1.1762  |         1.0697         |
|            hf_Reformer            |  4   | 0.9864 |  0.9679   |  1.144   |         1.0663         |
|              alexnet              | 128  | 0.9991 |  0.9973   |  1.0859  |         1.1352         |
|         soft_actor_critic         | 256  | 0.8527 |  0.6221   |  1.061   |         0.8351         |
|              demucs               |  4   | 0.9993 |  1.0016   |  1.0284  |         1.0348         |
|            timm_regnet            |  32  | 0.9182 |  0.7729   |  0.9926  |         0.9685         |
|            tts_angular            |  64  | 0.9301 |  0.8928   |  0.9614  |         0.9568         |
|            timm_vovnet            |  32  | 0.8533 |  0.7126   |  0.9357  |         0.9232         |
|      nvidia_deeprecommender       | 256  | 0.9988 |  0.9983   |  0.8719  |         1.0182         |
|   timm_vision_transformer_large   |  32  | 0.998  |    0.0    |   0.0    |          0.0           |
|               moco                |  32  | 0.9774 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|         phlippe_densenet          | 128  | 3.2444  |  7.0547   | 166.2585 |        163.5295        |
|            hf_T5_large            |  2   | 27.0243 |  59.3063  | 159.6113 |        156.1887        |
|         timm_efficientnet         |  32  | 4.9316  |  10.2778  | 142.5271 |        140.0126        |
|           hf_Longformer           |  2   | 11.4451 |  31.333   | 141.7021 |        109.9983        |
|            hf_BigBird             |  2   | 12.7767 |  37.0433  | 136.1409 |        115.4067        |
|        mobilenet_v3_large         |  32  | 3.4638  |  7.6892   | 133.901  |        135.3169        |
|            densenet121            |  4   | 7.6432  |  18.1903  | 128.3329 |        130.4886        |
|           mobilenet_v2            |  96  | 3.1332  |  6.9513   | 125.535  |        125.6978        |
|              yolov3               |  16  | 4.9969  |  10.5314  | 116.3597 |        115.2812        |
|            mnasnet1_0             |  32  | 3.1194  |  6.8515   | 107.6829 |        104.5344        |
|           timm_resnest            |  32  | 1.7963  |  3.9449   | 98.6776  |        98.2575         |
|             resnet152             |  32  | 9.1408  |  20.1482  | 97.3368  |        98.7008         |
|           hf_GPT2_large           |  4   | 14.9274 |  30.1758  | 95.5609  |        97.6286         |
|        shufflenet_v2_x1_0         | 128  | 3.4909  |  7.7315   | 79.7735  |         76.27          |
|        speech_transformer         |  32  | 5.9526  |  13.8426  | 73.3998  |        71.6007         |
| attention_is_all_you_need_pytorch | 256  | 4.5331  |  11.011   | 70.8274  |        70.2455         |
|            timm_regnet            |  32  | 6.6722  |  12.1049  | 68.7886  |        67.9304         |
|            timm_nfnet             | 128  | 5.7919  |  10.9138  | 67.6095  |         65.947         |
|        Background_Matting         |  4   | 3.0865  |  11.348   |  67.576  |        67.0218         |
|           BERT_pytorch            |  16  | 4.9396  |  11.6569  | 64.3539  |        65.0735         |
|             resnet50              |  32  |  3.19   |  6.9716   | 64.1498  |        62.0323         |
|            timm_vovnet            |  32  | 3.6691  |  6.3355   |  59.976  |        58.9893         |
|              hf_Bart              |  4   | 10.4856 |  17.9436  |  57.055  |         56.494         |
|           pytorch_unet            |  1   | 1.5307  |  4.3771   |  56.975  |        56.7571         |
|           hf_Bert_large           |  4   | 10.2371 |  21.0827  | 56.7234  |        57.5119         |
|       functorch_dp_cifar10        |  64  | 1.2186  |  2.4273   |  55.588  |        53.1216         |
|          resnext50_32x4d          |  8   |  3.235  |  7.1252   | 51.2116  |        47.3281         |
|      timm_vision_transformer      |  32  | 3.3821  |   7.319   | 48.1303  |        47.5286         |
|               hf_T5               |  8   | 5.6913  |  12.6378  | 46.6918  |        45.5359         |
|           fastNLP_Bert            |  6   | 5.2245  |  11.0622  | 45.7308  |        44.1725         |
|          pytorch_stargan          |  16  | 1.2047  |  3.2388   | 45.3968  |        45.4477         |
|          LearningToPaint          |  96  | 1.4127  |   2.94    | 43.7447  |        43.6163         |
|             resnet18              |  16  | 1.3353  |  2.8902   | 43.0784  |         42.088         |
|            hf_Reformer            |  4   | 4.1703  |  5.9628   | 41.7595  |        39.1765         |
|            Super_SloMo            |  6   | 2.7533  |   9.791   | 40.5603  |        39.6485         |
|              hf_GPT2              |  4   |  4.626  |  9.6748   | 38.7215  |        38.5141         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.238  |  2.9607   | 36.9175  |        34.3326         |
|             hf_Albert             |  8   |  2.496  |  8.0994   |  35.528  |        36.4656         |
|              hf_Bert              |  4   | 5.0835  |  10.5057  | 34.5147  |        35.5976         |
|          phlippe_resnet           | 128  | 1.3453  |  2.8109   | 30.6665  |        30.7678         |
|              demucs               |  4   | 1.4381  |  2.1594   | 29.8812  |        28.5707         |
|           hf_DistilBert           |  8   | 2.3638  |   5.238   | 27.1794  |        28.9626         |
|           squeezenet1_1           |  32  | 1.0301  |  1.7411   | 24.6593  |        22.9955         |
|          pytorch_struct           | 200  | 0.7533  |  1.3152   | 17.9413  |        18.3543         |
|               vgg16               |  64  | 0.6556  |   1.115   | 15.1295  |        15.4431         |
|              alexnet              | 128  | 0.4957  |   0.779   | 14.4647  |        14.8843         |
|                drq                |  1   | 0.6569  |  1.0076   |  9.4668  |         8.9796         |
|      nvidia_deeprecommender       | 256  | 0.4874  |  0.7675   |  9.3493  |         9.1734         |
|         soft_actor_critic         | 256  | 0.4311  |   0.597   |  7.8267  |         6.261          |
|               dcgan               |  32  | 0.4338  |   0.714   |  7.7983  |         7.5114         |
|               dlrm                | 1024 | 0.3736  |   0.772   |  7.4998  |         7.1808         |
|           lennard_jones           | 1000 | 0.3938  |  0.6029   |  5.6066  |         5.9799         |
|            tts_angular            |  64  | 0.4364  |  0.5062   |   5.39   |         5.2872         |
|               moco                |  32  | 27.8358 |    nan    |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 9.4203  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2037         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9863 |  0.7648   |  1.0104  |         1.101          |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  0.9852  |         0.9957         |
|            timm_nfnet             | 128  | 0.9068 |  0.8749   |  0.9689  |         1.0724         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.9425  |         1.026          |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9319  |         1.0718         |
|              yolov3               |  16  | 0.9877 |  0.8252   |  0.8931  |         1.0367         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8906  |         1.1284         |
|         timm_efficientnet         |  32  | 0.9859 |  0.7656   |  0.8699  |         0.9414         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.869          |
|           timm_resnest            |  32  | 0.9886 |  0.8952   |  0.8621  |         0.9516         |
|        shufflenet_v2_x1_0         | 128  | 0.9539 |  0.8397   |  0.8598  |         0.9649         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9903 |  0.8525   |  0.8512  |         0.9533         |
|             resnet152             |  32  | 0.9955 |  0.8921   |  0.8498  |         0.9409         |
|        Background_Matting         |  4   | 1.0132 |  0.6485   |  0.8484  |         1.0403         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9479         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|             resnet50              |  32  | 0.9908 |  0.8611   |  0.7829  |         0.8852         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|              demucs               |  4   | 0.966  |  0.9662   |  0.7733  |         0.9662         |
|           squeezenet1_1           |  32  | 0.9666 |  0.9309   |  0.7733  |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.7535  |         0.9285         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|            mnasnet1_0             |  32  | 0.9775 |  0.8678   |  0.7434  |         0.7749         |
|        mobilenet_v3_large         |  32  | 0.9805 |  0.8773   |  0.728   |         0.8723         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7274  |         0.7358         |
|               vgg16               |  64  | 0.9919 |  0.7243   |  0.7227  |         0.9805         |
|              alexnet              | 128  | 0.9455 |   0.793   |  0.7088  |         0.9385         |
|            densenet121            |  4   | 0.9944 |  0.9789   |  0.7085  |         0.8034         |
|            hf_BigBird             |  2   | 0.9486 |  0.9264   |  0.6971  |         1.1068         |
|          resnext50_32x4d          |  8   | 0.9942 |  0.8441   |  0.6655  |         0.7718         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9213 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8796   |  0.5904  |         0.6004         |
|             resnet18              |  16  | 0.9751 |  0.7996   |  0.5423  |         0.6127         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|           hf_Longformer           |  2   | 0.8567 |  0.8296   |  0.417   |         0.8947         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |          nan           |
|               moco                |  32  | 0.9889 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.9004 | 214.7077  | 126.2578 |        121.7115        |
|        Background_Matting         |  4   | 125.9942 | 918.4412  | 103.8086 |        104.5171        |
|            hf_T5_large            |  2   | 228.4931 | 302.4961  | 101.8655 |        117.8527        |
|               hf_T5               |  8   |  181.76  | 210.4074  | 94.6782  |        92.3221         |
|           hf_Longformer           |  2   | 137.522  | 200.5696  | 78.7721  |         95.49          |
|            timm_nfnet             | 128  | 120.1576 | 120.1217  |   78.2   |         81.736         |
|            hf_BigBird             |  2   | 202.5778 | 250.0013  | 77.4134  |        119.1635        |
|            hf_Reformer            |  4   | 81.9751  |  83.518   | 70.7923  |        75.9543         |
|            Super_SloMo            |  6   | 79.6419  | 443.0861  | 64.5676  |        64.4377         |
|              yolov3               |  16  | 68.7163  |  85.0074  | 57.8012  |         57.649         |
|            timm_regnet            |  32  |  60.754  |  71.9821  | 56.0658  |        57.5711         |
|               vgg16               |  64  |  66.28   |  66.3002  | 53.4597  |        52.8815         |
|             resnet152             |  32  | 64.5501  |  83.4489  | 52.5943  |        64.2175         |
|           hf_Bert_large           |  4   | 82.7422  |  93.2063  | 52.2156  |         53.098         |
|              demucs               |  4   | 53.5714  |  53.6044  | 51.5981  |        51.7316         |
| attention_is_all_you_need_pytorch | 256  | 58.1286  |  57.8039  | 37.1935  |        37.3926         |
|        speech_transformer         |  32  | 59.1739  |  86.8615  | 34.8644  |        35.3212         |
|              hf_Bart              |  4   |  59.501  |  73.2646  |  34.661  |         62.994         |
|           fastNLP_Bert            |  6   | 54.2388  |  69.6528  | 33.6962  |        34.7807         |
|           mobilenet_v2            |  96  | 47.1185  |  60.4124  | 31.1568  |        31.2136         |
|             hf_Albert             |  8   |  68.619  |  71.3454  | 29.7647  |        30.3999         |
|           pytorch_unet            |  1   |  39.978  | 194.0657  | 28.9741  |        29.2321         |
|              hf_GPT2              |  4   | 48.9552  |  50.7683  | 27.7618  |        27.5134         |
|            timm_vovnet            |  32  | 28.9945  |  34.4615  | 26.1466  |        26.4769         |
|              hf_Bert              |  4   | 40.7361  |  46.8405  | 22.7143  |        25.9246         |
|         timm_efficientnet         |  32  | 33.9707  |  51.0217  |  22.203  |        29.3931         |
|             resnet50              |  32  |  26.448  |  33.7783  |  22.007  |        24.5138         |
|           hf_DistilBert           |  8   |  31.96   |  32.6855  | 21.9147  |        21.9695         |
|        shufflenet_v2_x1_0         | 128  | 30.7245  |  43.9355  | 19.0687  |        25.8232         |
|            densenet121            |  4   | 54.3583  |  75.6769  | 18.7192  |        50.0238         |
|      timm_vision_transformer      |  32  | 29.2182  |  40.6831  |  18.274  |        22.2595         |
|           BERT_pytorch            |  16  | 54.5328  |  67.3682  | 17.8226  |        27.0328         |
|           timm_resnest            |  32  |  24.235  |  28.4097  | 15.3793  |        16.0352         |
|            mnasnet1_0             |  32  | 22.3977  |  31.3851  | 13.1427  |        19.7718         |
|        mobilenet_v3_large         |  32  | 26.8811  |  33.9822  | 12.9573  |        22.0331         |
|          resnext50_32x4d          |  8   |  20.441  |  29.8991  | 11.8085  |        20.7561         |
|      nvidia_deeprecommender       | 256  | 10.2194  |  10.2297  | 11.7038  |        10.0318         |
|          pytorch_stargan          |  16  | 14.9027  |  18.3967  | 11.6252  |        11.9104         |
|         phlippe_densenet          | 128  |  23.249  |  29.7432  |  11.187  |         23.563         |
|              alexnet              | 128  |  9.838   |  9.8359   |  9.0292  |         8.645          |
|          LearningToPaint          |  96  | 11.4415  |  14.5534  |  8.5376  |        10.5803         |
|            tts_angular            |  64  |  6.6386  |  6.8682   |  6.3927  |         6.4075         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.5893  |  15.4013  |  5.7995  |         8.4643         |
|             resnet18              |  16  |  9.1724  |  12.0395  |  5.731   |         9.2669         |
|           squeezenet1_1           |  32  | 10.4161  |  10.9052  |  5.4896  |         7.5639         |
|          phlippe_resnet           | 128  |  9.0821  |  11.519   |  5.2386  |         9.1024         |
|          pytorch_struct           | 200  |  4.9798  |  6.0046   |  3.209   |         4.3344         |
|       functorch_dp_cifar10        |  64  | 10.4091  |  11.2024  |  2.8544  |         7.5911         |
|                drq                |  1   |  3.388   |  4.3351   |  2.7425  |         4.0473         |
|               dlrm                | 1024 |  4.3508  |  4.7914   |  2.2284  |         3.9282         |
|         soft_actor_critic         | 256  |  1.7435  |  2.4276   |  1.9751  |         1.9011         |
|               dcgan               |  32  |  2.3623  |  3.0045   |  1.5067  |         2.4719         |
|           lennard_jones           | 1000 |  1.7885  |  2.1216   |  1.0862  |         1.7566         |
|   timm_vision_transformer_large   |  32  | 464.4176 |    nan    |   nan    |          nan           |
|               moco                |  32  | 51.4561  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 0.9472 |  0.8085   |  2.9393  |         1.1368         |
|             OPTForCausalLM              |  2  | 0.9885 |  0.9301   |  2.3819  |         2.4378         |
|       MT5ForConditionalGeneration       | 16  | 0.9906 |   0.849   |  2.3087  |         2.0972         |
|      GPT2ForSequenceClassification      |  4  | 0.9767 |  0.9529   |  2.2357  |         2.2734         |
|     MobileBertForQuestionAnswering      | 128 | 0.9513 |  0.8096   |  2.141   |         1.0761         |
|       ElectraForQuestionAnswering       | 64  | 0.9872 |  0.9772   |  2.1116  |         2.089          |
|             XGLMForCausalLM             |  8  | 0.9582 |  0.7398   |  1.9899  |         1.2128         |
|     M2M100ForConditionalGeneration      | 16  | 1.0176 |  0.7975   |  1.8974  |         1.4154         |
|            XLNetLMHeadModel             |  8  | 0.9964 |  0.9663   |  1.816   |         1.8086         |
|           ElectraForCausalLM            | 32  | 0.982  |  0.9357   |  1.7999  |         1.8356         |
|    LayoutLMForSequenceClassification    | 16  | 0.9845 |  0.9707   |  1.7891  |         1.7776         |
|       RobertaForQuestionAnswering       | 16  | 0.9846 |  0.9702   |  1.7776  |         1.7555         |
|        BertForQuestionAnswering         | 16  | 0.9855 |  0.9707   |  1.7642  |         1.751          |
|           RobertaForCausalLM            | 16  | 0.9869 |  0.9626   |  1.6751  |         1.6661         |
|               DistillGPT2               | 16  | 0.9881 |  0.9555   |   1.65   |         1.6933         |
|       AlbertForQuestionAnswering        |  4  | 0.9998 |  0.8855   |  1.6477  |         1.6436         |
|            AlbertForMaskedLM            |  4  | 0.9999 |  0.8848   |  1.6387  |         1.6363         |
|       T5ForConditionalGeneration        |  4  | 0.9811 |  0.8526   |  1.6271  |         1.7261         |
|                 T5Small                 |  4  | 0.9801 |  0.8515   |  1.6262  |         1.727          |
|     PLBartForConditionalGeneration      |  4  | 0.9857 |  0.9501   |  1.6164  |         1.6358         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9799 |  0.9611   |  1.6061  |         1.6266         |
|             BertForMaskedLM             | 16  | 0.9861 |   0.961   |  1.5983  |         1.5888         |
|           LayoutLMForMaskedLM           | 16  | 0.9865 |  0.9624   |  1.5932  |         1.5995         |
|          AllenaiLongformerBase          |  4  | 0.8834 |  0.6269   |  1.5906  |         1.487          |
|            PLBartForCausalLM            |  8  | 0.9883 |  0.9611   |  1.5812  |         1.6272         |
|                CamemBert                | 16  | 0.9871 |  0.9636   |  1.5452  |         1.5331         |
|             BartForCausalLM             |  4  | 0.9905 |  0.9636   |  1.4946  |         1.5393         |
|            MBartForCausalLM             |  4  | 0.9846 |  0.9662   |  1.4915  |         1.5387         |
|            YituTechConvBert             | 16  | 0.9853 |  0.9561   |  1.4907  |         1.4928         |
|      MBartForConditionalGeneration      |  2  | 0.9953 |  0.9626   |  1.4869  |         1.6857         |
|         Speech2Text2ForCausalLM         | 256 | 0.9747 |  0.9288   |  1.4669  |         1.5531         |
|         MegatronBertForCausalLM         |  4  | 0.9835 |  0.9091   |  1.4493  |         1.5029         |
|      BartForConditionalGeneration       |  2  | 0.9981 |  0.8524   |  1.4463  |         1.4792         |
|     DistilBertForQuestionAnswering      | 256 | 0.9939 |  0.9858   |  1.4405  |         1.4404         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9987 |  0.9225   |  1.3698  |         1.3994         |
|     PegasusForConditionalGeneration     | 32  | 0.998  |  0.9149   |  1.2688  |         1.3008         |
|            TrOCRForCausalLM             | 32  | 0.9908 |  0.9583   |  1.2391  |         1.2834         |
|          DistilBertForMaskedLM          | 128 | 0.9925 |   0.951   |  1.2201  |         1.2443         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9706 |  0.8697   |  1.2144  |         1.2004         |
|           PegasusForCausalLM            | 32  | 0.9476 |  0.9037   |  1.1649  |         1.1364         |
|       DebertaForQuestionAnswering       |  8  | 0.8011 |  0.6828   |  1.0211  |         0.9168         |
|           DebertaForMaskedLM            |  4  | 0.7249 |  0.5604   |  0.9137  |         0.7761         |
|          DebertaV2ForMaskedLM           |  1  | 0.6919 |  0.5163   |  0.8332  |         0.6158         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6955 |  0.5225   |  0.771   |         0.6277         |
|          BlenderbotForCausalLM          |  4  | 0.9294 |  0.7344   |   0.0    |         1.0945         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          AllenaiLongformerBase          |  4  | 11.9228 |  31.4916  | 141.4432 |        107.9971        |
|          MobileBertForMaskedLM          | 64  | 17.1588 |  40.6408  |  134.71  |        134.3151        |
|     MobileBertForQuestionAnswering      | 128 | 17.335  |  40.1343  | 128.4399 |        127.7439        |
|       MT5ForConditionalGeneration       | 16  | 8.1035  |  18.6939  | 127.874  |        126.8212        |
|          DebertaV2ForMaskedLM           |  1  | 15.2563 |  26.9983  | 125.7764 |        60.7726         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.2755 |  26.7311  | 123.482  |        58.5209         |
|     M2M100ForConditionalGeneration      | 16  | 11.8767 |  25.4389  | 106.1794 |        101.6244        |
|            XLNetLMHeadModel             |  8  | 10.513  |  27.6148  | 83.0234  |        84.0825         |
|           DebertaForMaskedLM            |  4  | 7.3556  |  13.6126  |  77.788  |         50.402         |
|       DebertaForQuestionAnswering       |  8  | 7.4461  |  13.3921  | 74.1197  |        53.0415         |
|      MBartForConditionalGeneration      |  2  | 11.5585 |  25.6877  |  71.311  |        71.1774         |
|             XGLMForCausalLM             |  8  | 9.5771  |  21.0865  | 71.3081  |        64.5781         |
|            YituTechConvBert             | 16  | 11.1853 |  19.4754  | 70.5774  |        69.5441         |
|     PegasusForConditionalGeneration     | 32  | 5.1335  |  19.216   | 67.3061  |        65.9934         |
|      BartForConditionalGeneration       |  2  | 11.4243 |  26.3786  | 65.8652  |        64.8384         |
|           ElectraForCausalLM            | 32  | 7.6586  |  13.6367  | 60.0672  |        59.8378         |
|         MegatronBertForCausalLM         |  4  | 10.5278 |  21.7137  | 59.1467  |        58.3861         |
|    MegatronBertForQuestionAnswering     |  8  | 10.4668 |  21.5222  | 58.4179  |        58.7478         |
|     PLBartForConditionalGeneration      |  4  | 9.4233  |  16.7576  | 57.0296  |        54.4783         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.5867  |  17.0334  | 49.3759  |        49.5385         |
|                 T5Small                 |  4  | 5.8382  |  12.7503  | 46.5739  |        45.4971         |
|       T5ForConditionalGeneration        |  4  | 5.8502  |  12.6947  | 45.9954  |        45.4609         |
|             BartForCausalLM             |  4  | 6.2163  |  11.6886  |  44.79   |        41.3585         |
|           PegasusForCausalLM            | 32  | 5.9999  |  11.252   | 44.3433  |        40.3488         |
|            MBartForCausalLM             |  4  | 6.6364  |  12.1852  |  44.306  |         40.651         |
|            TrOCRForCausalLM             | 32  | 6.3466  |  12.0003  |  42.311  |        39.7626         |
|             OPTForCausalLM              |  2  | 5.5029  |  10.9118  | 42.0484  |        38.0776         |
|    LayoutLMForSequenceClassification    | 16  | 5.5562  |  11.1681  | 41.3837  |        41.5127         |
|       ElectraForQuestionAnswering       | 64  | 5.2611  |  10.8586  | 39.0151  |        40.4587         |
|             BertForMaskedLM             | 16  | 5.1959  |  10.7047  | 35.8198  |        35.7752         |
|           LayoutLMForMaskedLM           | 16  |  5.615  |  11.3652  | 35.2396  |        36.9381         |
|        BertForQuestionAnswering         | 16  | 5.1445  |  10.5783  | 34.7228  |        35.7034         |
|       BlenderbotSmallForCausalLM        | 64  | 4.5083  |  8.4395   | 34.6068  |        34.3898         |
|            AlbertForMaskedLM            |  4  | 2.3504  |  8.0391   | 34.0383  |        33.6921         |
|            PLBartForCausalLM            |  8  | 3.7326  |  6.6363   | 33.6818  |         32.672         |
|      GPT2ForSequenceClassification      |  4  | 4.8251  |  9.9413   | 33.0626  |        31.4947         |
|                CamemBert                | 16  | 5.2632  |  10.8561  | 32.8148  |        33.9032         |
|           RobertaForCausalLM            | 16  | 5.3828  |  10.9136  |  32.799  |        32.5222         |
|     DistilBertForQuestionAnswering      | 256 | 2.5261  |  5.4229   | 32.7548  |        33.9374         |
|         Speech2Text2ForCausalLM         | 256 |  3.453  |  6.0828   | 32.3456  |        30.6851         |
|       RobertaForQuestionAnswering       | 16  | 5.2137  |  10.9101  | 31.3309  |        31.6306         |
|          DistilBertForMaskedLM          | 128 | 2.5392  |  5.4453   | 31.1149  |        32.4995         |
|       AlbertForQuestionAnswering        |  4  | 2.3398  |  8.0354   | 30.2116  |        30.0946         |
|               DistillGPT2               | 16  | 2.5529  |  5.0988   | 25.3685  |        25.8341         |
|          BlenderbotForCausalLM          |  4  | 11.421  |  22.3651  |   nan    |         63.216         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9252   |  1.062   |         1.1099         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|            YituTechConvBert             | 16  | 0.953  |  0.8749   |  0.9793  |         0.9905         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8787   |  0.9563  |         0.9847         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0019         |
|            PLBartForCausalLM            |  8  | 0.9237 |  0.8182   |  0.8907  |         0.9249         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8966   |  0.8901  |         1.0074         |
|           ElectraForCausalLM            | 32  | 0.9161 |   0.786   |  0.889   |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|            TrOCRForCausalLM             | 32  |  0.92  |   0.829   |  0.8619  |         0.9075         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8924   |  0.8491  |         0.9507         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|             BartForCausalLM             |  4  | 0.951  |  0.8923   |  0.8301  |         0.943          |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.8065  |         0.8318         |
|           PegasusForCausalLM            | 32  | 0.9257 |  0.8421   |  0.7952  |         0.9252         |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7573   |  0.7566  |         0.808          |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |  0.6744  |         0.9287         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.6058  |         0.8978         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9764   |  0.487   |         0.9802         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |  0.4688  |         0.8742         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1527         |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |   nan    |         0.9941         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.0487 | 300.8759  | 162.7553 |        162.9902        |
|       AlbertForQuestionAnswering        |  4  | 264.1861 | 298.0591  | 160.5194 |        160.9621        |
|            XLNetLMHeadModel             |  8  | 280.2215 | 290.6116  | 153.2865 |        154.5601        |
|      DebertaV2ForQuestionAnswering      |  2  | 151.4875 | 202.5042  | 134.3359 |        166.0065        |
|          DebertaV2ForMaskedLM           |  1  | 146.4002 | 197.0326  | 123.4073 |        161.9372        |
|          AllenaiLongformerBase          |  4  | 216.5838 | 289.0489  | 114.1271 |        122.135         |
|            TrOCRForCausalLM             | 32  | 138.6498 | 144.1281  | 111.4292 |        107.6125        |
|     PegasusForConditionalGeneration     | 32  | 147.792  | 152.4238  | 110.6682 |        116.2115        |
|      MBartForConditionalGeneration      |  2  | 139.0701 | 144.0379  | 96.7695  |        101.1704        |
|      BartForConditionalGeneration       |  2  | 148.3396 | 173.9695  | 94.3977  |        92.6426         |
|    MegatronBertForQuestionAnswering     |  8  | 144.8554 |  147.364  | 88.4857  |        87.4214         |
|            YituTechConvBert             | 16  | 127.8862 |  130.971  | 84.1101  |        84.0423         |
|     MobileBertForQuestionAnswering      | 128 | 180.9976 |  210.322  | 81.0949  |        160.9132        |
| BlenderbotSmallForConditionalGeneration | 64  | 111.7816 |  119.342  | 81.0136  |        79.3651         |
|                CamemBert                | 16  | 119.9168 | 122.6673  |  76.592  |        77.1872         |
|             BartForCausalLM             |  4  | 115.2872 | 118.1793  | 76.4322  |        73.8806         |
|            MBartForCausalLM             |  4  | 116.0904 | 117.7722  | 76.1678  |        73.7476         |
|       DebertaForQuestionAnswering       |  8  | 94.3628  | 110.6725  |  74.117  |        82.7053         |
|     M2M100ForConditionalGeneration      | 16  | 111.329  | 146.4892  | 73.6215  |        100.8351        |
|     PLBartForConditionalGeneration      |  4  | 120.8027 | 123.6813  | 73.5269  |        72.7054         |
|          MobileBertForMaskedLM          | 64  | 180.2861 | 215.3712  | 71.9105  |        185.2035        |
|     DistilBertForQuestionAnswering      | 256 | 103.8558 | 105.2428  | 71.7469  |        71.6753         |
|            PLBartForCausalLM            |  8  | 117.7519 | 116.7628  | 71.4948  |          69.2          |
|           LayoutLMForMaskedLM           | 16  | 113.9038 | 116.8757  | 70.6788  |        70.3216         |
|             OPTForCausalLM              |  2  | 173.5431 | 179.8322  |  70.265  |        68.6697         |
|           DebertaForMaskedLM            |  4  | 85.8701  | 110.7418  | 69.3754  |        79.1885         |
|          DistilBertForMaskedLM          | 128 | 85.2022  |  88.9967  | 69.3586  |        68.0004         |
|             BertForMaskedLM             | 16  | 111.4398 | 114.1906  | 68.6752  |        69.2125         |
|           RobertaForCausalLM            | 16  | 116.3508 | 119.3332  | 68.6648  |         68.956         |
|                 T5Small                 |  4  | 106.9773 | 122.4916  | 64.4645  |        60.2442         |
|       T5ForConditionalGeneration        |  4  | 106.4855 | 122.3744  | 64.3287  |        60.3175         |
|               DistillGPT2               | 16  | 106.991  | 110.6522  | 64.0128  |        62.5636         |
|           PegasusForCausalLM            | 32  | 78.4947  |  76.5073  | 59.4841  |        65.3102         |
|         MegatronBertForCausalLM         |  4  | 88.4683  |  95.3575  | 59.4597  |        58.1296         |
|             XGLMForCausalLM             |  8  | 123.6062 | 142.5396  | 54.7987  |        89.4287         |
|    LayoutLMForSequenceClassification    | 16  | 99.1064  | 100.5874  | 54.5041  |        54.9216         |
|       ElectraForQuestionAnswering       | 64  | 116.0331 | 117.0636  | 54.1713  |        54.8099         |
|        BertForQuestionAnswering         | 16  | 96.6624  |  97.9952  | 53.8939  |        54.2762         |
|       RobertaForQuestionAnswering       | 16  | 97.1092  |  98.4697  | 53.7846  |        54.4802         |
|           ElectraForCausalLM            | 32  | 89.7602  |  93.8945  | 48.8193  |        47.8772         |
|       BlenderbotSmallForCausalLM        | 64  | 59.7692  |  70.7112  | 47.7584  |        47.9927         |
|       MT5ForConditionalGeneration       | 16  | 93.9108  | 108.0745  | 44.1414  |        50.0092         |
|      GPT2ForSequenceClassification      |  4  | 93.6909  |  95.7392  | 40.8539  |        40.1522         |
|         Speech2Text2ForCausalLM         | 256 | 55.2479  |  56.0488  | 36.2987  |        34.4462         |
|          BlenderbotForCausalLM          |  4  | 114.8296 | 145.5167  |   nan    |        106.3601        |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.999  |  0.9975   |  3.0131  |         2.9771         |
|      xcit_large_24_p8_224       |  5  | 0.9892 |  0.8703   |  1.947   |         1.566          |
|         coat_lite_mini          | 128 | 0.9971 |  0.9953   |  1.9343  |         1.9107         |
|        twins_pcpvt_base         | 64  | 0.9984 |  0.9143   |  1.9133  |          1.68          |
|          ghostnet_100           | 128 | 0.9923 |  0.7464   |  1.8214  |         1.5885         |
|          gmlp_s16_224           | 128 | 0.9944 |  1.0824   |  1.7929  |         1.7812         |
|          gmixer_24_224          | 128 | 0.9951 |  0.8889   |  1.7371  |         1.7259         |
|           volo_d1_224           | 64  | 0.994  |  0.9733   |  1.6836  |         1.6643         |
|            lcnet_050            | 128 | 0.9418 |  0.7353   |  1.6827  |         1.4453         |
|         crossvit_9_240          | 128 | 0.9902 |  0.7824   |  1.6233  |         1.5983         |
|  swin_base_patch4_window7_224   | 64  | 0.9908 |  0.9556   |  1.6106  |         1.6018         |
|           convit_base           | 64  | 0.9982 |  0.9977   |  1.5531  |         1.5521         |
|             dla102              | 128 | 0.9959 |  0.8156   |  1.5267  |         1.5223         |
|       gluon_inception_v3        | 128 | 0.9967 |  0.8655   |  1.5096  |         1.5022         |
|        adv_inception_v3         | 128 | 0.9964 |  0.8607   |  1.5082  |         1.4999         |
|          inception_v3           | 128 | 0.9963 |  0.8647   |  1.5069  |         1.4966         |
|        sebotnet33ts_256         | 64  | 0.9567 |  0.7647   |  1.4946  |         1.5278         |
|            nfnet_l0             | 128 | 0.9893 |  0.8135   |  1.4916  |         1.4331         |
|          convnext_base          | 64  | 0.9835 |  0.9846   |  1.4866  |         1.4702         |
|           dm_nfnet_f0           | 128 | 0.9867 |  0.9845   |  1.4543  |         1.409          |
|            pit_b_224            | 64  | 0.9947 |  0.9922   |  1.4298  |         1.4236         |
|           mnasnet_100           | 128 | 0.9462 |  0.7408   |  1.4278  |         1.4847         |
|       eca_botnext26ts_256       | 128 | 0.9733 |  0.7186   |  1.4252  |         1.4089         |
|      mobilenetv3_large_100      | 128 | 0.9489 |   0.76    |  1.4218  |         1.4359         |
|           mobilevit_s           | 64  | 0.961  |  0.7263   |  1.4191  |         1.4324         |
|           selecsls42b           | 128 | 0.9981 |  0.8127   |  1.4067  |         1.4061         |
|           resnest101e           | 64  | 0.9953 |  0.8653   |  1.4055  |         1.3448         |
|           regnety_002           | 128 | 0.9516 |  0.7128   |  1.3932  |         1.2263         |
|          botnet26t_256          | 128 | 0.9724 |  0.8509   |  1.3879  |         1.4059         |
|         mobilenetv2_100         | 128 | 0.9482 |  0.7366   |  1.3821  |         1.4334         |
|        res2net50_14w_8s         | 128 | 0.9989 |  0.7905   |  1.378   |         1.3544         |
|          jx_nest_base           | 32  | 0.9872 |   0.985   |  1.3685  |         1.3608         |
|           res2next50            | 128 | 0.9987 |  0.8256   |  1.3676  |         1.3589         |
|          mixer_b16_224          | 128 | 0.9977 |  1.0146   |   1.36   |          1.36          |
|       tf_efficientnet_b0        | 128 | 0.9599 |  0.6815   |  1.3591  |         1.3888         |
|          spnasnet_100           | 128 | 0.9397 |  0.7385   |  1.3489  |         1.4119         |
|          cait_m36_384           |  4  | 0.9948 |   0.993   |  1.3475  |         1.341          |
|           fbnetc_100            | 128 | 0.947  |  0.7387   |  1.3422  |         1.3966         |
|      beit_base_patch16_224      | 64  | 0.9964 |   0.966   |  1.3409  |         1.3412         |
|        ese_vovnet19b_dw         | 128 | 0.9584 |  0.8334   |  1.3343  |         1.356          |
|         poolformer_m36          | 64  | 0.9864 |  0.9833   |  1.3261  |         1.3172         |
|            fbnetv3_b            | 128 | 0.949  |  0.7689   |  1.2987  |         1.2907         |
|            hrnet_w18            | 128 | 0.9924 |  0.6458   |  1.2942  |         1.3279         |
|           rexnet_100            | 128 | 0.9514 |  0.7025   |  1.2868  |         1.3208         |
|          resmlp_12_224          | 128 | 0.9932 |  0.8896   |  1.2516  |         1.2487         |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9938   |  1.2329  |         1.2325         |
|            tinynet_a            | 128 | 0.9452 |  0.6786   |  1.2273  |          1.26          |
|          cspdarknet53           | 64  | 0.9326 |  0.7854   |  1.2096  |         1.2453         |
|           tf_mixnet_l           | 128 | 0.9759 |  0.8268   |  1.1821  |         1.1877         |
|         visformer_small         | 128 | 0.996  |  0.9445   |  1.1737  |         1.1654         |
|            mixnet_l             | 128 | 0.9763 |   0.821   |  1.1726  |         1.178          |
|        res2net101_26w_4s        | 64  | 0.9993 |  0.7963   |  1.1481  |         1.0904         |
|             dpn107              | 32  | 0.9314 |  0.8076   |  1.0898  |         1.1381         |
|        gluon_xception65         | 32  | 0.9924 |  0.8429   |  1.0706  |         1.0756         |
|            repvgg_a2            | 128 | 0.9349 |  0.7551   |  1.0631  |         1.0981         |
|     swsl_resnext101_32x16d      | 32  | 0.9979 |  0.8422   |  1.0611  |         1.0259         |
|            gernet_l             | 128 | 0.935  |  0.7933   |  1.0321  |         1.0615         |
|        convmixer_768_32         | 32  | 0.9987 |   0.964   |  1.002   |         1.0024         |
|          pnasnet5large          | 16  | 0.9863 |   0.915   |  0.9087  |         0.9202         |
| deit_base_distilled_patch16_224 | 64  | 0.9963 |  0.9937   |   0.0    |          0.0           |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+-------------+------------------------+
|              name               | bs | eager |   aot_eager   |  inductor   | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+-------------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |    pass     |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |    pass     |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |    pass     |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |    pass     |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |    pass     |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |    pass     |          pass          |
|           regnety_002           | 8  | pass  |     pass      |    pass     |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |    pass     |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |    pass     |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |    pass     |          pass          |
|           res2next50            | 8  | pass  |     pass      |    pass     |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |    pass     |          pass          |
|           resnest101e           | 8  | pass  |     pass      |    pass     |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |    pass     |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |    pass     |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |    pass     |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |    pass     |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |    pass     |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |    pass     |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |    pass     |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |    pass     |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |    pass     |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |    pass     |          pass          |
|         visformer_small         | 8  | pass  |     pass      |    pass     |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |    pass     |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |    pass     |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |    pass     |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |    pass     |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |    pass     |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |    pass     |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |    pass     |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |    pass     |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |    pass     |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |    pass     |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |    pass     |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |    pass     |          pass          |
|           convit_base           | 8  | pass  |     pass      |    pass     |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |    pass     |          pass          |
|          convnext_base          | 8  | pass  |     pass      |    pass     |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |    pass     |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |    pass     |          pass          |
|             dla102              | 8  | pass  |     pass      |    pass     |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |    pass     |          pass          |
|             dpn107              | 8  | pass  |     pass      |    pass     |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |    pass     |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |    pass     |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |    pass     |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |    pass     |          pass          |
|            gernet_l             | 8  | pass  |     pass      |    pass     |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |    pass     |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |    pass     |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |    pass     |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |    pass     |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |    pass     |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |    pass     |          pass          |
|          inception_v3           | 8  | pass  |     pass      |    pass     |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |    pass     |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |    pass     |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |    pass     |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      | fail_to_run |      fail_to_run       |
+---------------------------------+----+-------+---------------+-------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.6666  |  11.2465  | 278.5384 |        275.3279        |
|          ghostnet_100           | 128 | 7.5582  |  15.8331  | 237.8118 |        231.4849        |
|            hrnet_w18            | 128 | 9.5791  |  36.3099  | 227.9758 |        228.5473        |
|            fbnetv3_b            | 128 | 8.4731  |  17.1571  | 164.147  |        167.8978        |
|           mobilevit_s           | 64  | 5.3085  |  11.5363  | 157.9263 |        158.7458        |
|           resnest101e           | 64  | 11.3351 |  24.5523  | 157.1213 |        156.0727        |
|           tf_mixnet_l           | 128 | 9.1605  |  16.9895  | 157.0361 |        154.467         |
|            tinynet_a            | 128 | 5.9863  |  12.333   | 156.7248 |        155.5618        |
|       tf_efficientnet_b0        | 128 | 5.1467  |  10.4221  | 155.1317 |        152.3895        |
|          inception_v3           | 128 | 5.7627  |  12.5605  | 153.0311 |        154.8467        |
|        adv_inception_v3         | 128 | 5.8119  |  12.3071  | 152.9945 |        155.2662        |
|       gluon_inception_v3        | 128 | 5.7914  |  12.5614  | 152.9558 |        151.7647        |
|            mixnet_l             | 128 | 8.6658  |  16.4671  | 152.0509 |        150.7686        |
|          pnasnet5large          | 16  | 8.1153  |  25.8938  | 151.8038 |        147.9882        |
|      mobilenetv3_large_100      | 128 |  4.365  |  8.4255   | 151.3115 |        157.9578        |
|        res2net101_26w_4s        | 64  | 10.5855 |  24.9146  | 141.2599 |        140.7468        |
|        twins_pcpvt_base         | 64  | 10.558  |  23.4278  | 138.3236 |        138.1102        |
|           fbnetc_100            | 128 | 5.1711  |  9.6772   | 134.6528 |        133.9571        |
|          spnasnet_100           | 128 | 5.0192  |  9.4074   | 132.9417 |        133.5134        |
|         mobilenetv2_100         | 128 | 4.0394  |  7.9442   | 127.7707 |        121.6193        |
|      xcit_large_24_p8_224       |  5  | 12.719  |  28.2192  | 121.2818 |        120.2327        |
|           mnasnet_100           | 128 |  4.067  |  7.6712   | 118.0898 |        118.306         |
|        res2net50_14w_8s         | 128 | 9.1489  |  22.7381  | 114.103  |         112.89         |
|          cait_m36_384           |  4  | 13.7524 |  30.9022  | 103.2305 |        104.879         |
|        sebotnet33ts_256         | 64  | 4.1881  |  8.9661   | 102.5787 |        101.6174        |
|           regnety_002           | 128 |  5.014  |   8.948   | 102.4201 |        104.6328        |
|  swin_base_patch4_window7_224   | 64  |  8.33   |  19.5639  | 99.6424  |        99.4254         |
|          cspdarknet53           | 64  | 5.9298  |  11.043   | 96.9954  |        93.1244         |
|       eca_botnext26ts_256       | 128 |  3.083  |   6.895   | 96.1908  |        93.1719         |
|             dla102              | 128 | 6.2267  |  14.3066  | 94.0905  |        91.5661         |
|         poolformer_m36          | 64  | 7.6195  |  13.9633  | 93.8768  |        94.3115         |
|             dpn107              | 32  | 9.9348  |  19.6636  | 91.7375  |        92.3161         |
|            lcnet_050            | 128 | 2.5072  |  5.0271   |  91.621  |        94.6388         |
|           selecsls42b           | 128 | 2.4831  |  5.3124   | 88.4206  |        89.5519         |
|        gluon_xception65         | 32  | 7.8693  |  16.7961  | 87.0744  |        87.3521         |
|         coat_lite_mini          | 128 | 3.3372  |  8.0481   | 86.7509  |        85.5701         |
|          botnet26t_256          | 128 | 3.0059  |  5.9882   | 85.7957  |         87.683         |
|         crossvit_9_240          | 128 | 5.9052  |  13.354   | 81.8395  |        80.4075         |
|           res2next50            | 128 | 4.9538  |  12.0109  | 81.8077  |        79.8932         |
|          jx_nest_base           | 32  | 6.8444  |  14.6626  | 80.0927  |        80.3718         |
|            gernet_l             | 128 | 4.9993  |   8.891   | 78.0114  |        76.4248         |
|            nfnet_l0             | 128 |  5.332  |  10.9522  | 75.3793  |         73.213         |
|        ese_vovnet19b_dw         | 128 | 2.5224  |  4.5878   | 74.2863  |        74.8613         |
|           dm_nfnet_f0           | 128 | 6.1417  |  11.6139  | 69.6619  |        67.4194         |
|           volo_d1_224           | 64  | 5.1622  |  11.761   | 67.8755  |        68.5925         |
|         visformer_small         | 128 | 2.5924  |   6.182   |  64.029  |        63.2192         |
|        tnt_s_patch16_224        | 128 | 6.4444  |  16.1255  | 61.9176  |         62.062         |
|            repvgg_a2            | 128 | 4.8529  |  8.7802   | 56.5745  |        54.0344         |
|          gmlp_s16_224           | 128 | 5.6314  |  12.0182  | 56.4902  |        55.0073         |
|     swsl_resnext101_32x16d      | 32  | 6.2349  |  13.7528  | 55.7593  |        54.9174         |
|          convnext_base          | 64  | 6.6991  |  12.8525  | 55.3393  |        53.8679         |
|          gmixer_24_224          | 128 | 5.6697  |  13.659   | 47.3405  |        46.3078         |
|           convit_base           | 64  | 3.4233  |  8.7755   | 45.2519  |        44.5417         |
|            pit_b_224            | 64  | 3.3954  |  8.1204   | 42.2121  |        41.1907         |
|          resmlp_12_224          | 128 | 2.8047  |  5.4194   | 37.0127  |        37.1048         |
|      vit_base_patch16_224       | 64  | 3.0226  |  7.0329   |  36.111  |         35.705         |
|      beit_base_patch16_224      | 64  | 3.9641  |  8.7268   | 34.1401  |        32.8188         |
|        convmixer_768_32         | 32  | 1.7169  |  6.9321   | 32.1575  |        32.0999         |
|          mixer_b16_224          | 128 | 2.6996  |  5.8309   |  30.89   |        30.9419         |
| deit_base_distilled_patch16_224 | 64  | 3.1703  |  7.1305   |   nan    |          nan           |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9634 |  0.9151   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |   nan    |          nan           |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.8405 | 311.9606  | 299.7587 |        300.1203        |
|          pnasnet5large          | 16  | 199.0165 | 213.7872  | 217.109  |        214.1329        |
|            hrnet_w18            | 128 | 281.7728 | 432.8001  | 216.5323 |        210.8916        |
|           tf_mixnet_l           | 128 | 194.4533 | 229.5069  | 160.166  |        159.5971        |
|            mixnet_l             | 128 | 185.664  | 220.4025  | 154.538  |        153.6246        |
|          cait_m36_384           |  4  | 167.7845 | 168.2853  | 124.1148 |        124.3883        |
|           resnest101e           | 64  | 164.6903 | 189.9065  | 116.2486 |        121.7979        |
|             dla102              | 128 | 172.6174 | 210.5072  | 112.4885 |        112.8938        |
|     swsl_resnext101_32x16d      | 32  | 118.9437 | 140.5497  | 111.8661 |        115.385         |
|         poolformer_m36          | 64  | 146.7207 | 147.2783  | 109.122  |        110.0258        |
|        tnt_s_patch16_224        | 128 | 323.3371 | 323.7502  | 107.1508 |        108.4395        |
|          inception_v3           | 128 |  160.7   | 185.2247  | 106.4335 |        107.0931        |
|        adv_inception_v3         | 128 | 160.6713 | 186.1522  | 106.189  |        106.7263        |
|       gluon_inception_v3        | 128 | 160.9913 | 185.1756  | 106.1741 |        106.7634        |
|           convit_base           | 64  | 163.2423 | 163.2898  | 104.9732 |        104.9703        |
|        res2net50_14w_8s         | 128 | 141.1296 | 177.5109  | 102.0999 |        103.9743        |
|             dpn107              | 32  | 113.8251 | 131.6137  | 97.2697  |        93.3657         |
|        gluon_xception65         | 32  | 99.7809  | 117.1754  | 92.5245  |        91.9857         |
|           res2next50            | 128 | 126.1762 | 152.5279  | 92.0188  |         92.749         |
|  swin_base_patch4_window7_224   | 64  | 147.3648 | 152.8507  | 90.7857  |        91.3114         |
|           dm_nfnet_f0           | 128 | 128.4885 | 129.0355  | 87.0357  |        90.0562         |
|          mixer_b16_224          | 128 | 116.7474 | 114.9169  | 86.4817  |        85.6291         |
|        res2net101_26w_4s        | 64  | 100.0332 |  125.215  | 84.9901  |        90.4567         |
|            fbnetv3_b            | 128 | 115.5832 | 142.3881  | 84.2346  |        85.0422         |
|            pit_b_224            | 64  | 118.7009 | 119.0604  | 82.5998  |        82.9392         |
|          convnext_base          | 64  | 124.1755 | 124.1913  | 82.1464  |        83.0533         |
|         visformer_small         | 128 | 91.2041  |  96.3362  | 77.5784  |        78.0968         |
|          gmlp_s16_224           | 128 | 137.9815 | 126.4127  | 76.4995  |        76.8404         |
|      beit_base_patch16_224      | 64  | 101.5309 | 104.7502  | 75.6536  |        75.3719         |
|            nfnet_l0             | 128 | 112.9654 | 137.2933  | 75.4053  |        78.0908         |
|       eca_botnext26ts_256       | 128 | 108.678  | 147.4205  | 74.4167  |        75.1889         |
|          jx_nest_base           | 32  | 101.6391 | 101.4633  | 73.4269  |        73.5992         |
|          cspdarknet53           | 64  | 94.8649  | 112.6921  |  73.418  |         71.135         |
|          botnet26t_256          | 128 | 101.9472 | 116.5272  | 71.4859  |        70.4952         |
|           volo_d1_224           | 64  | 120.9638 | 123.3732  |  71.406  |        72.2348         |
|            gernet_l             | 128 | 77.6619  |  91.7178  | 70.5319  |        68.5665         |
|      vit_base_patch16_224       | 64  | 86.8872  |  87.1734  | 70.3529  |        70.2825         |
|            repvgg_a2            | 128 | 77.7607  |  96.4854  | 68.3337  |        66.1561         |
|          gmixer_24_224          | 128 | 118.0687 | 132.3211  | 67.7983  |        67.9743         |
|      xcit_large_24_p8_224       |  5  | 123.202  | 147.2504  | 62.6352  |         77.502         |
|        twins_pcpvt_base         | 64  | 122.2228 | 128.7227  | 60.3152  |        68.6304         |
|       tf_efficientnet_b0        | 128 | 84.8842  | 119.5286  | 59.8763  |        58.6751         |
|           rexnet_100            | 128 | 80.2831  | 108.7761  |  59.197  |        57.7291         |
|           fbnetc_100            | 128 | 83.2598  | 106.8152  | 58.6039  |         56.366         |
|         coat_lite_mini          | 128 | 113.1918 | 113.2779  | 58.3388  |        59.0564         |
|           mobilevit_s           | 64  | 84.7711  | 112.2957  | 57.3628  |        56.8281         |
|            tinynet_a            | 128 | 73.8946  | 102.7755  |  56.783  |        55.3495         |
|        sebotnet33ts_256         | 64  | 80.5691  |  100.637  | 51.5517  |        50.3794         |
|         crossvit_9_240          | 128 | 82.8178  | 104.1841  | 50.2967  |        51.1288         |
|          ghostnet_100           | 128 | 90.9719  | 120.7534  |  49.528  |        56.6861         |
|          spnasnet_100           | 128 | 70.6752  |  89.9564  | 49.1078  |        46.9296         |
|        ese_vovnet19b_dw         | 128 | 64.5115  |  74.3021  |  46.468  |        45.6444         |
|         mobilenetv2_100         | 128 | 65.7166  |  84.4783  | 44.9991  |        43.3714         |
|           selecsls42b           | 128 | 60.0436  |  73.7241  | 42.6879  |        42.6915         |
|           mnasnet_100           | 128 | 64.5489  |  82.4577  | 42.6531  |        41.0716         |
|          resmlp_12_224          | 128 | 53.4265  |  59.7196  | 42.4097  |        42.4682         |
|      mobilenetv3_large_100      | 128 | 61.5025  |  76.7829  | 40.9657  |        40.4945         |
|           regnety_002           | 128 | 40.5074  |  52.7367  | 26.7737  |        29.8631         |
|            lcnet_050            | 128 | 31.7207  |  40.5505  | 17.7463  |        20.6859         |
| deit_base_distilled_patch16_224 | 64  | 84.9269  |  85.0082  |   nan    |          nan           |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

Build Summary

see more

Run name

day_086_27_03_23_performance_amp_689

Commit hashes

pytorch commit: 08c1d1a
pytorch commit date: 2023-03-28 02:25:45+00:00
torchbench commit: 575b6b9932aae3afddc4e0acb1487c8d8201a328
torchbench commit date: 2023-03-26 10:37:27-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git08c1d1a

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 93%, 42/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 98%, 44/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.56x    |    1.59x    |    1.41x    |
| inductor_no_cudagraphs |   1.27x    |    1.48x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.81    |    7.55     |    5.90     |
|       aot_eager        |    9.42    |    16.19    |    13.06    |
|        inductor        |   62.38    |    63.61    |   109.08    |
| inductor_no_cudagraphs |   62.29    |    58.90    |   108.18    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.79x    |    0.89x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.03x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Previous report name: /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Passrate diff

+------------------------+-------------+------------+-------------+
|        compiler        |    suite    | prev_value |  cur_value  |
+------------------------+-------------+------------+-------------+
|        inductor        | torchbench  | 87%, 52/60 | 85%, 51/60  |
|        inductor        | huggingface | 93%, 42/45 | 93%, 42/45  |
|        inductor        | timm_models | 98%, 59/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60 | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 98%, 44/45 | 98%, 44/45  |
| inductor_no_cudagraphs | timm_models | 98%, 59/60 | 100%, 60/60 |
+------------------------+-------------+------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.56x    |   1.56x   |
|        inductor        | huggingface |   1.59x    |   1.59x   |
|        inductor        | timm_models |   1.40x    |   1.41x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.49x    |   1.48x   |
| inductor_no_cudagraphs | timm_models |   1.38x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |      mobilenet_v3_large       |  fail_accuracy  |     fail_accuracy      |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |        vision_maskrcnn        | eager_variation |    eager_variation     |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |              gat              |     0.0000      |         0.0000         |
| torchbench  |              gcn              |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |             sage              |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |          pass          |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |             dcgan             |  1.4572  |         0.8228         |
| torchbench  |         lennard_jones         |  1.3775  |         0.897          |
| torchbench  |       soft_actor_critic       |  1.1849  |         0.8816         |
| torchbench  |          tts_angular          |  0.9584  |         0.9489         |
| torchbench  |          timm_vovnet          |  0.9279  |         0.9241         |
| torchbench  |    nvidia_deeprecommender     |  0.8727  |         1.0187         |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0817         |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |  DebertaForQuestionAnswering  |  1.0057  |         0.9236         |
| huggingface |      DebertaForMaskedLM       |  0.9725  |         0.7766         |
| huggingface | DebertaV2ForQuestionAnswering |  0.9084  |         0.6348         |
| huggingface |     DebertaV2ForMaskedLM      |  0.8546  |         0.623          |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.0901         |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 175.0277 |        172.0462        |
| torchbench  |        phlippe_densenet        | 167.2775 |        166.6507        |
| torchbench  |         hf_Longformer          | 148.0971 |        120.557         |
| torchbench  |       timm_efficientnet        | 147.1401 |        145.6476        |
| torchbench  |           hf_BigBird           | 145.8702 |        125.838         |
| torchbench  |          densenet121           | 138.7787 |        133.6249        |
| torchbench  |       mobilenet_v3_large       | 136.1357 |        134.6953        |
| torchbench  |          mobilenet_v2          | 130.3079 |        129.2573        |
| torchbench  | timm_vision_transformer_large  |   nan    |        123.4003        |
| huggingface |     AllenaiLongformerBase      | 149.0083 |        117.0486        |
| huggingface |     MobileBertForMaskedLM      | 143.8808 |        142.3362        |
| huggingface | MobileBertForQuestionAnswering | 137.8034 |        136.0617        |
| huggingface |      DebertaV2ForMaskedLM      | 134.5703 |        67.4038         |
| huggingface | DebertaV2ForQuestionAnswering  | 134.3899 |        65.4044         |
| huggingface |  MT5ForConditionalGeneration   | 131.8772 |        131.426         |
| timm_models |           rexnet_100           | 298.096  |        277.4582        |
| timm_models |           hrnet_w18            | 245.3984 |        246.8611        |
| timm_models |          ghostnet_100          | 232.9816 |        241.7617        |
| timm_models |           fbnetv3_b            | 173.4663 |        175.3756        |
| timm_models |          resnest101e           | 166.1121 |        165.7984        |
| timm_models |         pnasnet5large          | 164.7778 |        159.1506        |
| timm_models |           tinynet_a            | 162.2366 |        163.014         |
| timm_models |          mobilevit_s           | 161.1674 |        158.7045        |
| timm_models |       gluon_inception_v3       | 159.8668 |        159.2567        |
| timm_models |          inception_v3          | 159.3256 |        160.0972        |
| timm_models |            mixnet_l            | 157.2724 |        159.8059        |
| timm_models |     mobilenetv3_large_100      | 157.134  |        154.4084        |
| timm_models |        adv_inception_v3        | 156.2602 |        159.5745        |
| timm_models |       tf_efficientnet_b0       | 154.115  |        147.0983        |
| timm_models |       res2net101_26w_4s        | 150.8055 |        148.9207        |
| timm_models |          tf_mixnet_l           | 150.4952 |        160.1388        |
| timm_models |        twins_pcpvt_base        | 146.2708 |        147.2427        |
| timm_models |           fbnetc_100           | 139.1197 |        134.727         |
| timm_models |          spnasnet_100          | 136.5761 |        134.8972        |
| timm_models |      xcit_large_24_p8_224      | 129.0758 |        131.1981        |
| timm_models |        mobilenetv2_100         | 128.7111 |        127.9593        |
| timm_models |          mnasnet_100           | 126.3795 |        119.7307        |
| timm_models |        res2net50_14w_8s        | 122.1861 |        123.4464        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |                 yolov3                  |  0.8919  |         1.0115         |
| torchbench  |              hf_GPT2_large              |  0.8906  |         1.1284         |
| torchbench  |            timm_efficientnet            |  0.8696  |         0.9411         |
| torchbench  |           speech_transformer            |  0.8651  |         0.869          |
| torchbench  |              timm_resnest               |  0.8604  |         0.9665         |
| torchbench  |           shufflenet_v2_x1_0            |  0.8602  |         0.9647         |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.8835         |
| torchbench  |               timm_regnet               |  0.8501  |         0.9501         |
| torchbench  |                resnet152                |  0.8495  |         0.9414         |
| torchbench  |           Background_Matting            |  0.8484  |         1.0412         |
| torchbench  |              hf_DistilBert              |  0.8476  |         0.9479         |
| torchbench  |               hf_T5_large               |  0.8201  |         1.168          |
| torchbench  |              pytorch_unet               |  0.8134  |         0.9308         |
| torchbench  |            phlippe_densenet             |  0.8058  |         0.8659         |
| torchbench  |                  dcgan                  |  0.7821  |         0.9645         |
| torchbench  |                resnet50                 |  0.7819  |         0.8859         |
| torchbench  |                 demucs                  |  0.7734  |         0.9662         |
| torchbench  |              squeezenet1_1              |  0.7733  |         0.9087         |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.8893         |
| torchbench  |                 hf_Bart                 |  0.7535  |         0.9285         |
| torchbench  |               timm_vovnet               |  0.7529  |         0.8869         |
| torchbench  |           mobilenet_v3_large            |  0.7281  |         0.8716         |
| torchbench  |             pytorch_struct              |  0.7274  |         0.7358         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9805         |
| torchbench  |               mnasnet1_0                |  0.7144  |         0.8049         |
| torchbench  |               densenet121               |  0.7094  |         0.8034         |
| torchbench  |                 alexnet                 |  0.7088  |         0.9379         |
| torchbench  |               hf_BigBird                |  0.6971  |         1.1068         |
| torchbench  |             resnext50_32x4d             |  0.6653  |         0.7722         |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.8931         |
| torchbench  |                   drq                   |  0.6379  |         0.9573         |
| torchbench  |            soft_actor_critic            |  0.6066  |         0.9973         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.6065  |         0.6172         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.7458         |
| torchbench  |                resnet18                 |  0.5423  |         0.6127         |
| torchbench  |              lennard_jones              |  0.5317  |         0.9997         |
| torchbench  |               hf_Reformer               |  0.4538  |         0.8022         |
| torchbench  |              hf_Longformer              |  0.417   |         0.8951         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3169  |         0.3395         |
| huggingface |            PLBartForCausalLM            |  0.8907  |         0.9249         |
| huggingface |     PegasusForConditionalGeneration     |  0.8901  |         1.0074         |
| huggingface |           ElectraForCausalLM            |  0.889   |         0.8941         |
| huggingface |          DistilBertForMaskedLM          |  0.8849  |         0.9624         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8729  |         0.9803         |
| huggingface |      MBartForConditionalGeneration      |  0.8672  |         1.0307         |
| huggingface |            TrOCRForCausalLM             |  0.8619  |         0.9075         |
| huggingface |            MBartForCausalLM             |  0.8491  |         0.9507         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         1.0139         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         1.0962         |
| huggingface |             BartForCausalLM             |  0.8301  |         0.943          |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8065  |         0.8318         |
| huggingface |           PegasusForCausalLM            |  0.7952  |         0.9252         |
| huggingface |         Speech2Text2ForCausalLM         |  0.7566  |         0.808          |
| huggingface |          MobileBertForMaskedLM          |  0.7473  |         1.016          |
| huggingface |     M2M100ForConditionalGeneration      |  0.7188  |         0.9535         |
| huggingface |             XGLMForCausalLM             |  0.6744  |         0.9287         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6569  |         0.8392         |
| huggingface |           DebertaForMaskedLM            |  0.5501  |         0.9978         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5197  |         0.9665         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.487   |         0.9801         |
| huggingface |          AllenaiLongformerBase          |  0.4688  |         0.8742         |
| huggingface |       DebertaForQuestionAnswering       |  0.4601  |         1.1527         |
| timm_models |                hrnet_w18                |  0.8918  |          0.99          |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1115         |
| timm_models |              inception_v3               |  0.8904  |         1.0171         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0171         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0171         |
| timm_models |                 dpn107                  |  0.8833  |         0.9642         |
| timm_models |            gluon_xception65             |  0.8831  |         0.9705         |
| timm_models |              ghostnet_100               |  0.8807  |         0.977          |
| timm_models |              spnasnet_100               |  0.8786  |         0.9451         |
| timm_models |          mobilenetv3_large_100          |  0.877   |         0.9361         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1871         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0072         |
| timm_models |          xcit_large_24_p8_224           |  0.8721  |         0.9732         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9607         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9483         |
| timm_models |                mixnet_l                 |  0.8687  |         0.9902         |
| timm_models |               mnasnet_100               |  0.8683  |         0.9403         |
| timm_models |               res2next50                |  0.866   |         0.9547         |
| timm_models |              cait_m36_384               |  0.8632  |         0.989          |
| timm_models |               fbnetc_100                |  0.8596  |         0.9535         |
| timm_models |                pit_b_224                |  0.8578  |         1.0242         |
| timm_models |               selecsls42b               |  0.8576  |         0.9664         |
| timm_models |              convnext_base              |  0.8505  |         1.0338         |
| timm_models |                gernet_l                 |  0.8499  |         0.9706         |
| timm_models |         swsl_resnext101_32x16d          |  0.8461  |         0.9786         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0202         |
| timm_models |              botnet26t_256              |  0.8239  |         0.9779         |
| timm_models |                lcnet_050                |  0.805   |         0.884          |
| timm_models |                repvgg_a2                |  0.7738  |         0.9611         |
| timm_models |               regnety_002               |  0.7602  |         0.8966         |
| timm_models |             crossvit_9_240              |  0.7526  |         0.9898         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9045         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9604         |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/passrate_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/geomean_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Accuracy regressions

+------------------------+--------------------+-------------+---------------+
|        compiler        |        name        | prev_status |  cur_status   |
+------------------------+--------------------+-------------+---------------+
|        inductor        | mobilenet_v3_large |    pass     | fail_accuracy |
| inductor_no_cudagraphs | mobilenet_v3_large |    pass     | fail_accuracy |
+------------------------+--------------------+-------------+---------------+

Performance speedup regressions

+------------------------+-------------+-------------+------------+
|        compiler        |    name     | prev_status | cur_status |
+------------------------+-------------+-------------+------------+
| inductor_no_cudagraphs | tts_angular |   0.9568    |   0.9489   |
+------------------------+-------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+---------------+-------------+------------+
|        compiler        |     name      | prev_status | cur_status |
+------------------------+---------------+-------------+------------+
| inductor_no_cudagraphs |  hf_BigBird   |  115.4067   |  125.838   |
| inductor_no_cudagraphs | hf_Longformer |  109.9983   |  120.557   |
+------------------------+---------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689

Compilation latency (sec) regressions

+------------------------+------------------+-------------+------------+
|        compiler        |       name       | prev_status | cur_status |
+------------------------+------------------+-------------+------------+
|        inductor        |   mnasnet_100    |  118.0898   |  126.3795  |
|        inductor        | res2net50_14w_8s |   114.103   |  122.1861  |
| inductor_no_cudagraphs | res2net50_14w_8s |   112.89    |  123.4464  |
+------------------------+------------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9671 |  0.9194   |  3.6327  |         1.3531         |
|           BERT_pytorch            |  16  | 0.9914 |  0.8222   |  3.0739  |         2.0962         |
|            densenet121            |  4   | 0.9869 |  0.7164   |  2.7889  |         1.0755         |
|            hf_BigBird             |  2   | 0.953  |   0.776   |  2.5334  |         1.631          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9502 |  0.9006   |  2.4298  |         1.7263         |
|             hf_Albert             |  8   | 0.9921 |  0.9563   |  2.2942  |         2.2842         |
|            hf_T5_large            |  2   | 0.9763 |  0.8059   |  2.2293  |         1.9303         |
|         phlippe_densenet          | 128  | 0.986  |  0.7747   |  2.0454  |         1.0105         |
|        mobilenet_v3_large         |  32  | 0.9937 |  0.7831   |  2.0445  |         1.1888         |
|               dlrm                | 1024 | 0.9365 |  0.8544   |  1.9849  |         1.1422         |
|           squeezenet1_1           |  32  | 0.9814 |  0.9359   |  1.9064  |         1.1986         |
|               hf_T5               |  8   | 0.9853 |  0.8488   |  1.8911  |         1.9791         |
|          phlippe_resnet           | 128  | 0.9806 |  0.7614   |  1.8249  |         1.004          |
|              hf_Bert              |  4   | 0.9953 |  0.8428   |  1.8082  |         1.5833         |
|              hf_GPT2              |  4   | 0.9937 |   0.957   |  1.7592  |         1.7797         |
|          resnext50_32x4d          |  8   | 0.9836 |   0.717   |  1.7203  |         0.9842         |
|              hf_Bart              |  4   | 0.9685 |  0.7761   |  1.685   |         1.4317         |
|           hf_GPT2_large           |  4   | 0.9827 |  0.9715   |  1.6719  |         1.7303         |
|            mnasnet1_0             |  32  | 0.988  |   0.731   |  1.6686  |         1.081          |
|        shufflenet_v2_x1_0         | 128  | 0.9939 |  0.7551   |  1.6264  |         1.182          |
|        speech_transformer         |  32  | 0.9807 |  0.8235   |  1.6015  |         1.6325         |
|             resnet18              |  16  | 0.9873 |  0.7692   |  1.5963  |         0.969          |
| attention_is_all_you_need_pytorch | 256  | 0.9897 |  0.9131   |  1.5843  |         1.4606         |
|           hf_Bert_large           |  4   | 0.9998 |   0.862   |  1.584   |         1.5542         |
|           timm_resnest            |  32  | 0.9928 |   0.851   |  1.5686  |         1.5117         |
|      timm_vision_transformer      |  32  | 0.985  |  0.8966   |  1.5454  |         1.3883         |
|            timm_nfnet             | 128  | 0.9864 |  0.9846   |  1.5419  |         1.4716         |
|           fastNLP_Bert            |  6   | 0.9974 |  0.8596   |  1.5333  |         1.4982         |
|           mobilenet_v2            |  96  | 0.9971 |  0.7774   |  1.5253  |         1.5058         |
|                drq                |  1   | 0.9535 |  0.7518   |  1.5077  |         1.0465         |
|          pytorch_struct           | 200  | 0.9277 |  0.7743   |  1.4714  |         1.1111         |
|               dcgan               |  32  | 0.862  |  0.6962   |  1.4572  |         0.8228         |
|           hf_Longformer           |  2   | 0.8297 |  0.5642   |  1.4387  |         1.2583         |
|         timm_efficientnet         |  32  | 0.9369 |  0.6232   |  1.4353  |         1.0874         |
|           hf_DistilBert           |  8   | 0.9806 |  0.9572   |  1.425   |         1.4401         |
|           lennard_jones           | 1000 | 0.8365 |  0.7411   |  1.3775  |         0.897          |
|           pytorch_unet            |  1   | 0.9967 |  0.2051   |  1.3582  |         1.3522         |
|          LearningToPaint          |  96  | 0.9901 |   0.771   |  1.3172  |         1.0505         |
|          pytorch_stargan          |  16  | 0.9935 |  0.8057   |  1.2558  |         1.2555         |
|               vgg16               |  64  | 0.9995 |  0.9987   |  1.2402  |         1.2539         |
|            Super_SloMo            |  6   | 0.9968 |   0.179   |  1.2323  |         1.2329         |
|        Background_Matting         |  4   | 0.9991 |  0.1368   |  1.213   |         1.209          |
|              yolov3               |  16  | 0.9963 |  0.8065   |  1.1973  |         1.1988         |
|             resnet50              |  32  | 0.9956 |  0.7743   |  1.1874  |         1.0603         |
|         soft_actor_critic         | 256  | 0.8551 |  0.6181   |  1.1849  |         0.8816         |
|             resnet152             |  32  | 0.9955 |  0.7613   |  1.1773  |         0.9944         |
|            hf_Reformer            |  4   | 0.9864 |  0.9635   |  1.1391  |         1.0669         |
|              alexnet              | 128  | 0.9989 |  0.9985   |  1.0887  |         1.1359         |
|              demucs               |  4   | 1.0015 |  1.0023   |  1.0362  |         1.0359         |
|            timm_regnet            |  32  | 0.9199 |  0.7717   |  0.9883  |         0.969          |
|            tts_angular            |  64  | 0.9301 |  0.8968   |  0.9584  |         0.9489         |
|            timm_vovnet            |  32  | 0.8501 |  0.7082   |  0.9279  |         0.9241         |
|      nvidia_deeprecommender       | 256  | 0.9985 |  0.9986   |  0.8727  |         1.0187         |
|   timm_vision_transformer_large   |  32  | 0.9981 |    0.0    |   0.0    |         1.0817         |
|               moco                |  32  | 0.9798 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 27.0134 |  55.6675  | 175.0277 |        172.0462        |
|         phlippe_densenet          | 128  | 3.2128  |  7.0963   | 167.2775 |        166.6507        |
|           hf_Longformer           |  2   | 11.4257 |  30.9424  | 148.0971 |        120.557         |
|         timm_efficientnet         |  32  | 5.0173  |  10.2886  | 147.1401 |        145.6476        |
|            hf_BigBird             |  2   | 12.9475 |  37.2892  | 145.8702 |        125.838         |
|            densenet121            |  4   | 7.5714  |  18.2336  | 138.7787 |        133.6249        |
|        mobilenet_v3_large         |  32  | 3.4673  |  7.7713   | 136.1357 |        134.6953        |
|           mobilenet_v2            |  96  | 3.1161  |  7.0724   | 130.3079 |        129.2573        |
|              yolov3               |  16  | 5.0022  |  10.8563  | 117.8226 |        119.335         |
|            mnasnet1_0             |  32  | 3.1269  |  6.9045   | 107.2577 |        106.8678        |
|             resnet152             |  32  | 9.1582  |  20.4949  | 106.7891 |        104.7473        |
|           hf_GPT2_large           |  4   | 15.015  |  30.1587  | 105.5779 |        104.8973        |
|           timm_resnest            |  32  | 1.8556  |  3.9884   | 98.9391  |        100.0488        |
|        shufflenet_v2_x1_0         | 128  | 3.4281  |  7.8125   | 79.5753  |        81.4465         |
|        speech_transformer         |  32  | 6.1287  |  14.0659  | 76.2935  |        78.7864         |
| attention_is_all_you_need_pytorch | 256  | 4.4017  |  10.9401  |  75.518  |        73.7178         |
|            timm_regnet            |  32  | 6.6264  |  12.2998  | 72.2454  |        71.0639         |
|            timm_nfnet             | 128  | 5.9581  |  11.1149  | 71.7642  |        71.9969         |
|        Background_Matting         |  4   | 3.0079  |  11.3832  |  69.733  |        68.8604         |
|           BERT_pytorch            |  16  | 4.8783  |  11.7252  | 69.4298  |        68.4175         |
|           hf_Bert_large           |  4   | 10.2418 |  21.5469  | 63.2545  |        61.8667         |
|             resnet50              |  32  | 3.2036  |  7.1136   | 62.9411  |        65.0426         |
|            timm_vovnet            |  32  | 3.6085  |  6.4495   | 62.3769  |        61.8972         |
|              hf_Bart              |  4   | 10.4369 |  18.1724  | 61.6938  |        59.8068         |
|           pytorch_unet            |  1   | 1.5486  |  4.4433   | 61.0042  |        58.0834         |
|       functorch_dp_cifar10        |  64  | 1.2111  |  2.4051   | 55.4188  |        53.7659         |
|          resnext50_32x4d          |  8   | 3.2205  |   7.087   | 51.0095  |        51.7061         |
|               hf_T5               |  8   | 5.6654  |  12.7427  | 50.1842  |        49.9968         |
|      timm_vision_transformer      |  32  | 3.3577  |  7.3236   | 49.6583  |        48.7493         |
|           fastNLP_Bert            |  6   | 5.2125  |  11.2731  | 48.9441  |        46.6417         |
|          pytorch_stargan          |  16  | 1.2183  |  3.2662   | 46.0182  |        45.3191         |
|          LearningToPaint          |  96  | 1.3929  |  2.8925   | 43.4161  |        43.0658         |
|              hf_GPT2              |  4   | 4.6997  |  9.6419   | 42.3645  |        41.5415         |
|            hf_Reformer            |  4   |  4.18   |  6.0922   | 42.1316  |        38.2618         |
|             resnet18              |  16  | 1.3489  |  2.8926   | 41.1363  |        43.7662         |
|            Super_SloMo            |  6   | 2.7336  |  9.8361   | 40.6306  |        41.1468         |
|             hf_Albert             |  8   | 2.5294  |  8.0852   |  37.681  |        40.1984         |
|              hf_Bert              |  4   | 5.0304  |  10.6038  | 37.6296  |        37.6044         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2376  |  2.9874   | 35.8718  |        36.3291         |
|          phlippe_resnet           | 128  | 1.3505  |  2.8498   | 32.1894  |        32.3287         |
|           hf_DistilBert           |  8   | 2.3773  |  5.2147   | 30.8552  |        30.7238         |
|              demucs               |  4   |  1.425  |  2.1872   | 29.4498  |        29.0347         |
|           squeezenet1_1           |  32  | 1.0429  |  1.7662   | 21.9808  |        23.3323         |
|          pytorch_struct           | 200  | 0.7489  |  1.3401   | 18.1954  |        18.8462         |
|               vgg16               |  64  | 0.6414  |  1.1372   | 15.1135  |        14.9837         |
|              alexnet              | 128  | 0.4805  |  0.7923   |  14.034  |        15.1701         |
|                drq                |  1   | 0.6611  |  1.0407   |  9.8619  |         9.7473         |
|      nvidia_deeprecommender       | 256  | 0.4891  |   0.751   |  9.274   |         9.1723         |
|               dcgan               |  32  | 0.4336  |  0.7182   |  7.9999  |         7.6836         |
|               dlrm                | 1024 | 0.3784  |  0.7873   |  7.6757  |         6.8427         |
|         soft_actor_critic         | 256  | 0.4192  |  0.6053   |  6.6959  |         7.3023         |
|           lennard_jones           | 1000 |  0.397  |  0.5979   |  5.5442  |         5.614          |
|            tts_angular            |  64  | 0.4429  |  0.5178   |  5.2831  |         5.1994         |
|   timm_vision_transformer_large   |  32  | 9.4672  |    nan    |   nan    |        123.4003        |
|               moco                |  32  | 27.5723 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2037         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9865 |  0.7655   |  1.0099  |         1.1021         |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  0.9852  |         0.9957         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|            timm_nfnet             | 128  | 0.9068 |  0.8747   |  0.9683  |         1.0711         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.9645 |  0.8338   |  0.9425  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9319  |         1.0718         |
|              yolov3               |  16  | 0.9839 |  0.8253   |  0.8919  |         1.0115         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8906  |         1.1284         |
|         timm_efficientnet         |  32  | 0.9846 |  0.7658   |  0.8696  |         0.9411         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.869          |
|           timm_resnest            |  32  | 0.9887 |  0.8984   |  0.8604  |         0.9665         |
|        shufflenet_v2_x1_0         | 128  | 0.955  |   0.837   |  0.8602  |         0.9647         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9913 |  0.8527   |  0.8501  |         0.9501         |
|             resnet152             |  32  | 0.9959 |  0.8949   |  0.8495  |         0.9414         |
|        Background_Matting         |  4   | 1.0125 |  0.6486   |  0.8484  |         1.0412         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9479         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9932 |  0.8619   |  0.7819  |         0.8859         |
|              demucs               |  4   | 0.9663 |  0.9664   |  0.7734  |         0.9662         |
|           squeezenet1_1           |  32  | 0.966  |  0.9291   |  0.7733  |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.7535  |         0.9285         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|        mobilenet_v3_large         |  32  | 0.9805 |  0.9467   |  0.7281  |         0.8716         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7274  |         0.7358         |
|               vgg16               |  64  | 0.9919 |  0.7243   |  0.7227  |         0.9805         |
|            mnasnet1_0             |  32  | 0.978  |   0.894   |  0.7144  |         0.8049         |
|            densenet121            |  4   | 0.994  |  0.9808   |  0.7094  |         0.8034         |
|              alexnet              | 128  | 0.9455 |   0.793   |  0.7088  |         0.9379         |
|            hf_BigBird             |  2   | 0.9486 |  0.9268   |  0.6971  |         1.1068         |
|          resnext50_32x4d          |  8   | 0.9939 |  0.8425   |  0.6653  |         0.7722         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8796   |  0.6065  |         0.6172         |
|          LearningToPaint          |  96  | 0.9203 |  0.7116   |  0.5925  |         0.7458         |
|             resnet18              |  16  | 0.9751 |  0.7996   |  0.5423  |         0.6127         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|           hf_Longformer           |  2   | 0.8567 |  0.8296   |  0.417   |         0.8951         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|               moco                |  32  | 0.9894 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.5725 | 215.2502  | 125.2155 |        120.9284        |
|        Background_Matting         |  4   | 125.6768 | 918.8282  | 103.6701 |        103.8763        |
|            hf_T5_large            |  2   | 228.3988 | 272.9173  | 101.4985 |        136.7298        |
|               hf_T5               |  8   | 181.4872 |  210.598  | 94.9066  |        91.7113         |
|           hf_Longformer           |  2   | 138.0366 | 199.6223  | 78.6083  |        90.7492         |
|            hf_BigBird             |  2   | 204.7985 | 251.0809  | 78.5837  |        118.6921        |
|            timm_nfnet             | 128  | 119.6745 |  120.142  | 76.9284  |        80.0952         |
|            hf_Reformer            |  4   | 82.0196  |  83.997   | 71.1829  |        75.9289         |
|            Super_SloMo            |  6   | 79.7893  | 442.9637  | 64.3744  |        64.3478         |
|              yolov3               |  16  | 68.7712  |  84.9164  | 57.2218  |        57.1107         |
|            timm_regnet            |  32  |  61.17   |  72.4498  | 56.1113  |        57.4051         |
|               vgg16               |  64  | 66.2195  |  66.2906  | 53.4182  |        52.7855         |
|             resnet152             |  32  | 64.6612  |  84.1795  | 52.7566  |        68.9135         |
|           hf_Bert_large           |  4   | 82.6344  |  95.2576  | 52.2626  |        53.2005         |
|              demucs               |  4   | 53.6874  |  53.6778  | 52.0768  |        51.6802         |
| attention_is_all_you_need_pytorch | 256  | 57.7413  |  59.9485  | 37.2511  |        37.4483         |
|        speech_transformer         |  32  | 64.5674  |  86.3784  | 35.2684  |        38.3156         |
|              hf_Bart              |  4   | 60.9905  |  78.2904  |  34.726  |         41.838         |
|           fastNLP_Bert            |  6   | 52.9642  |  60.4013  | 33.6912  |        34.6493         |
|           mobilenet_v2            |  96  | 47.1015  |  60.3597  | 30.8205  |        31.2673         |
|             hf_Albert             |  8   | 69.8616  |  71.4676  | 29.7763  |        30.3589         |
|           pytorch_unet            |  1   | 39.9405  |  194.113  | 29.3072  |        29.4566         |
|              hf_GPT2              |  4   | 49.1353  |  50.1546  |  27.71   |        27.6825         |
|            timm_vovnet            |  32  | 28.8442  |  34.8079  | 26.2639  |         26.838         |
|              hf_Bert              |  4   | 40.3138  |  47.3738  | 22.7884  |         25.568         |
|         timm_efficientnet         |  32  | 34.3545  |  51.0571  | 22.2616  |        29.4479         |
|           hf_DistilBert           |  8   | 32.0914  |  32.677   | 22.0753  |        21.8173         |
|             resnet50              |  32  |  26.347  |  34.1444  |  22.033  |        24.9692         |
|            densenet121            |  4   |  60.484  |  75.4129  | 18.9761  |        56.8039         |
|        shufflenet_v2_x1_0         | 128  | 30.6711  |  40.7887  | 18.6507  |         25.936         |
|      timm_vision_transformer      |  32  | 29.7245  |  34.6162  | 18.2991  |        20.3218         |
|           BERT_pytorch            |  16  | 54.0985  |  68.8601  | 17.7491  |         25.701         |
|           timm_resnest            |  32  | 24.3376  |  28.3507  | 15.3753  |        15.9573         |
|            mnasnet1_0             |  32  | 22.5906  |  30.4703  | 13.1702  |        20.8432         |
|        mobilenet_v3_large         |  32  | 27.0871  |  34.2898  | 13.1572  |        22.7627         |
|          resnext50_32x4d          |  8   | 20.5492  |  28.3837  |  11.942  |        20.6952         |
|      nvidia_deeprecommender       | 256  | 10.2256  |  10.2271  | 11.7114  |        10.0353         |
|          pytorch_stargan          |  16  | 14.9925  |  18.4656  |  11.576  |        11.8505         |
|         phlippe_densenet          | 128  | 23.3341  |  29.8807  | 11.2977  |        23.3541         |
|              alexnet              | 128  |  9.8292  |  9.8365   |  9.0177  |         8.6353         |
|          LearningToPaint          |  96  |  11.421  |  14.5319  |  8.5696  |        10.5413         |
|            tts_angular            |  64  |  6.6275  |  6.9505   |  6.482   |         6.5789         |
|             resnet18              |  16  |  9.3185  |  12.135   |  5.8388  |         9.5228         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.8445  |  15.6321  |  5.7432  |         9.761          |
|           squeezenet1_1           |  32  | 10.5348  |  11.0074  |  5.435   |         9.7108         |
|          phlippe_resnet           | 128  |  9.1343  |  11.7661  |  4.936   |         8.9215         |
|          pytorch_struct           | 200  |  5.0202  |  6.0167   |  3.1921  |         4.2007         |
|       functorch_dp_cifar10        |  64  | 10.3124  |  10.8987  |  2.8649  |         7.5911         |
|                drq                |  1   |  3.4781  |  4.3549   |  2.1863  |         3.2028         |
|               dlrm                | 1024 |  4.3104  |  4.8617   |  2.1493  |         3.6685         |
|               dcgan               |  32  |  2.432   |  2.9888   |  1.6008  |         2.6083         |
|         soft_actor_critic         | 256  |  1.7232  |  2.4279   |  1.3635  |         2.0537         |
|           lennard_jones           | 1000 |  1.8088  |  2.0812   |  1.1199  |         1.766          |
|   timm_vision_transformer_large   |  32  | 464.3401 |    nan    |   nan    |        428.7739        |
|               moco                |  32  | 51.1512  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9922 |  0.9363   |  2.4111  |         2.435          |
|          MobileBertForMaskedLM          | 64  | 0.9474 |  0.8164   |  2.3875  |         1.0827         |
|      GPT2ForSequenceClassification      |  4  | 0.9771 |  0.9527   |  2.2522  |         2.2827         |
|             XGLMForCausalLM             |  8  | 0.9423 |  0.7368   |  2.1193  |         1.2032         |
|       ElectraForQuestionAnswering       | 64  | 0.9872 |  0.9764   |  2.1168  |         2.0893         |
|       MT5ForConditionalGeneration       | 16  | 0.9909 |  0.8468   |  2.0956  |         1.8473         |
|     MobileBertForQuestionAnswering      | 128 | 0.9497 |  0.8141   |  2.0628  |         1.0508         |
|     M2M100ForConditionalGeneration      | 16  | 1.0216 |  0.8144   |  1.9538  |         1.3608         |
|            XLNetLMHeadModel             |  8  | 0.994  |  0.9656   |  1.8097  |         1.8141         |
|    LayoutLMForSequenceClassification    | 16  | 0.9847 |  0.9709   |  1.792   |         1.7717         |
|        BertForQuestionAnswering         | 16  | 0.9846 |  0.9696   |  1.7869  |         1.7588         |
|           ElectraForCausalLM            | 32  | 0.9825 |  0.9343   |  1.7854  |         1.8149         |
|       RobertaForQuestionAnswering       | 16  | 0.9845 |  0.9691   |  1.7853  |         1.7655         |
|           RobertaForCausalLM            | 16  | 0.9867 |   0.959   |  1.679   |         1.6647         |
|               DistillGPT2               | 16  | 0.987  |  0.9553   |  1.658   |         1.6992         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.8856   |  1.6523  |         1.6513         |
|            AlbertForMaskedLM            |  4  | 0.9997 |   0.885   |  1.6482  |         1.6465         |
|            PLBartForCausalLM            |  8  | 0.9912 |  0.9613   |  1.6327  |         1.6744         |
|       T5ForConditionalGeneration        |  4  | 0.9809 |  0.8513   |  1.6323  |         1.7257         |
|                 T5Small                 |  4  | 0.9799 |  0.8509   |  1.6313  |         1.7277         |
|     PLBartForConditionalGeneration      |  4  | 0.9897 |  0.9493   |  1.6142  |         1.6343         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9814 |  0.9608   |  1.6044  |         1.6294         |
|          AllenaiLongformerBase          |  4  | 0.8852 |  0.6263   |  1.6011  |         1.4965         |
|             BertForMaskedLM             | 16  | 0.9862 |  0.9607   |  1.5939  |         1.5839         |
|           LayoutLMForMaskedLM           | 16  | 0.9858 |  0.9622   |  1.5704  |         1.5939         |
|                CamemBert                | 16  | 0.9873 |  0.9625   |  1.5445  |         1.5327         |
|      MBartForConditionalGeneration      |  2  | 0.9982 |  0.9539   |  1.4972  |         1.4696         |
|            YituTechConvBert             | 16  | 0.9859 |  0.9546   |  1.4892  |         1.4904         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9987 |  0.9116   |  1.489   |         1.4168         |
|             BartForCausalLM             |  4  | 0.9901 |  0.9651   |  1.489   |         1.5336         |
|            MBartForCausalLM             |  4  | 0.9883 |  0.9632   |  1.4883  |         1.5396         |
|         MegatronBertForCausalLM         |  4  | 0.9896 |  0.9135   |  1.4635  |         1.4978         |
|         Speech2Text2ForCausalLM         | 256 | 0.9781 |  0.9315   |  1.4585  |         1.5394         |
|      BartForConditionalGeneration       |  2  | 0.9944 |  0.9567   |  1.4542  |         1.4476         |
|     DistilBertForQuestionAnswering      | 256 | 0.9938 |  0.9875   |  1.4483  |         1.4463         |
|     PegasusForConditionalGeneration     | 32  | 0.9969 |  0.9327   |  1.3471  |         1.3417         |
|            TrOCRForCausalLM             | 32  | 0.9883 |  0.9577   |  1.2413  |         1.2844         |
|           PegasusForCausalLM            | 32  | 0.963  |  0.8913   |  1.2397  |         1.1434         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9702 |  0.8939   |  1.2346  |         1.2027         |
|          DistilBertForMaskedLM          | 128 | 0.9925 |  0.9509   |  1.2086  |         1.2338         |
|       DebertaForQuestionAnswering       |  8  | 0.8171 |   0.695   |  1.0057  |         0.9236         |
|           DebertaForMaskedLM            |  4  | 0.7362 |  0.5834   |  0.9725  |         0.7766         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6969 |  0.5237   |  0.9084  |         0.6348         |
|          DebertaV2ForMaskedLM           |  1  | 0.6993 |  0.5192   |  0.8546  |         0.623          |
|          BlenderbotForCausalLM          |  4  | 0.952  |  0.7462   |   0.0    |         1.0901         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          AllenaiLongformerBase          |  4  | 11.5491 |  32.1697  | 149.0083 |        117.0486        |
|          MobileBertForMaskedLM          | 64  | 17.3789 |  40.9296  | 143.8808 |        142.3362        |
|     MobileBertForQuestionAnswering      | 128 | 17.0055 |  40.6342  | 137.8034 |        136.0617        |
|          DebertaV2ForMaskedLM           |  1  | 15.4507 |  27.3559  | 134.5703 |        67.4038         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.2244 |  26.5897  | 134.3899 |        65.4044         |
|       MT5ForConditionalGeneration       | 16  | 8.2051  |  18.8381  | 131.8772 |        131.426         |
|     M2M100ForConditionalGeneration      | 16  |  11.86  |  25.9879  | 111.3205 |        104.9358        |
|            XLNetLMHeadModel             |  8  | 10.5184 |  27.1484  | 91.0385  |        92.0297         |
|       DebertaForQuestionAnswering       |  8  | 7.1943  |  13.5702  | 80.0925  |        49.5645         |
|           DebertaForMaskedLM            |  4  | 7.4746  |  13.6857  | 79.6978  |        53.1473         |
|             XGLMForCausalLM             |  8  | 9.5501  |  21.3241  | 78.7919  |        71.0625         |
|      MBartForConditionalGeneration      |  2  | 11.6888 |  26.5771  |  78.394  |        78.4998         |
|            YituTechConvBert             | 16  | 10.6875 |  19.7041  |  75.767  |        74.1828         |
|     PegasusForConditionalGeneration     | 32  | 5.1447  |  19.5205  | 75.4259  |        71.9789         |
|      BartForConditionalGeneration       |  2  | 11.4469 |  26.0366  | 75.2148  |        74.0916         |
|           ElectraForCausalLM            | 32  | 7.5865  |  13.5695  | 65.7243  |        64.7114         |
|         MegatronBertForCausalLM         |  4  | 10.4599 |  21.7316  | 65.0427  |        64.4654         |
|    MegatronBertForQuestionAnswering     |  8  | 10.2687 |  21.5163  | 64.8474  |        64.2287         |
|     PLBartForConditionalGeneration      |  4  | 9.3408  |  16.5802  | 60.5608  |         57.287         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.6701  |  17.0149  |  54.135  |        54.0817         |
|                 T5Small                 |  4  | 5.6117  |  13.0865  | 49.2723  |        48.7284         |
|       T5ForConditionalGeneration        |  4  | 5.6262  |  13.0665  | 48.9402  |        49.1956         |
|           PegasusForCausalLM            | 32  | 5.9581  |  11.5701  | 47.2528  |        42.0853         |
|            MBartForCausalLM             |  4  | 6.2792  |  11.9161  | 47.1684  |        43.6937         |
|             BartForCausalLM             |  4  | 6.2853  |  11.8165  | 46.9825  |        42.9519         |
|    LayoutLMForSequenceClassification    | 16  |  5.529  |  11.2556  | 46.0377  |        44.7065         |
|            TrOCRForCausalLM             | 32  | 6.1595  |  12.0425  | 45.8123  |        42.0414         |
|       ElectraForQuestionAnswering       | 64  | 5.2565  |  10.777   |  44.167  |        44.0049         |
|             OPTForCausalLM              |  2  | 5.4029  |  11.0608  | 42.6162  |        40.5667         |
|           LayoutLMForMaskedLM           | 16  | 5.6321  |  11.2726  | 40.6053  |        38.0981         |
|        BertForQuestionAnswering         | 16  | 5.3138  |  10.7151  | 39.2774  |        39.2533         |
|             BertForMaskedLM             | 16  | 5.3077  |  10.8623  |  38.468  |        38.7983         |
|       BlenderbotSmallForCausalLM        | 64  | 4.6354  |  8.3781   | 38.4318  |        37.1494         |
|                CamemBert                | 16  | 5.2201  |  10.6841  | 37.0982  |        35.2229         |
|            AlbertForMaskedLM            |  4  | 2.3706  |  8.2329   |   36.5   |        37.4561         |
|      GPT2ForSequenceClassification      |  4  |  4.877  |  9.9109   | 35.8087  |        35.4436         |
|           RobertaForCausalLM            | 16  | 5.2225  |  11.1136  | 35.7767  |        35.5432         |
|     DistilBertForQuestionAnswering      | 256 | 2.5362  |  5.3981   | 35.2067  |        34.9595         |
|       RobertaForQuestionAnswering       | 16  | 5.1897  |  11.0194  | 34.9925  |        34.4571         |
|          DistilBertForMaskedLM          | 128 | 2.5065  |  5.4527   | 34.0194  |        31.9776         |
|         Speech2Text2ForCausalLM         | 256 | 3.2298  |  6.1085   |  33.829  |        31.4542         |
|       AlbertForQuestionAnswering        |  4  | 2.3475  |  8.1067   | 33.1661  |        33.6222         |
|            PLBartForCausalLM            |  8  | 3.6371  |  6.7874   | 33.0076  |        30.7785         |
|               DistillGPT2               | 16  | 2.5688  |  5.1273   | 27.3125  |        28.5401         |
|          BlenderbotForCausalLM          |  4  | 11.5212 |  22.4397  |   nan    |        70.5052         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9252   |  1.062   |         1.1099         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|            YituTechConvBert             | 16  | 0.953  |  0.8732   |  0.9793  |         0.9905         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8779   |  0.9563  |         0.9847         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0019         |
|            PLBartForCausalLM            |  8  | 0.9237 |  0.8168   |  0.8907  |         0.9249         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8957   |  0.8901  |         1.0074         |
|           ElectraForCausalLM            | 32  | 0.9161 |   0.786   |  0.889   |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|            TrOCRForCausalLM             | 32  |  0.92  |  0.8307   |  0.8619  |         0.9075         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8924   |  0.8491  |         0.9507         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|             BartForCausalLM             |  4  | 0.951  |  0.8923   |  0.8301  |         0.943          |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.8065  |         0.8318         |
|           PegasusForCausalLM            | 32  | 0.9238 |  0.8421   |  0.7952  |         0.9252         |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7545   |  0.7566  |         0.808          |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.7188  |         0.9535         |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |  0.6744  |         0.9287         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9763   |  0.487   |         0.9801         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |  0.4688  |         0.8742         |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0537   |  0.4601  |         1.1527         |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |   nan    |         0.9941         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 265.9715 | 300.5062  | 161.4247 |        161.5151        |
|       AlbertForQuestionAnswering        |  4  | 263.8414 | 297.8781  | 159.713  |        159.7474        |
|            XLNetLMHeadModel             |  8  | 281.567  | 289.4246  | 154.1894 |        153.8016        |
|      DebertaV2ForQuestionAnswering      |  2  | 149.8193 | 201.2344  | 135.384  |        168.7582        |
|          DebertaV2ForMaskedLM           |  1  | 152.6777 | 196.1358  | 123.4129 |        165.6233        |
|          AllenaiLongformerBase          |  4  | 206.1304 | 291.1078  | 113.3479 |        121.6028        |
|     PegasusForConditionalGeneration     | 32  | 139.5672 | 152.1441  | 112.0198 |        108.2495        |
|            TrOCRForCausalLM             | 32  | 139.4835 | 143.9071  | 110.9058 |        107.0188        |
|      MBartForConditionalGeneration      |  2  | 138.8513 | 145.2219  | 96.9579  |        93.6097         |
|      BartForConditionalGeneration       |  2  | 139.8664 |  144.79   | 94.5135  |        99.1385         |
|    MegatronBertForQuestionAnswering     |  8  | 144.4967 |  147.302  |  88.567  |        87.0822         |
|            YituTechConvBert             | 16  | 127.1832 | 131.0978  | 84.2424  |         83.973         |
| BlenderbotSmallForConditionalGeneration | 64  | 113.1618 | 122.3035  | 82.6709  |        79.4117         |
|     MobileBertForQuestionAnswering      | 128 | 175.6206 | 213.8894  | 81.3139  |        157.7919        |
|                CamemBert                | 16  | 119.9084 | 122.8167  | 76.6098  |        77.0758         |
|             BartForCausalLM             |  4  | 115.0171 | 117.3316  | 76.2321  |        74.1589         |
|            MBartForCausalLM             |  4  | 114.7543 | 117.6472  | 76.1588  |        73.6129         |
|       DebertaForQuestionAnswering       |  8  | 92.5163  | 108.7068  | 75.6299  |        82.1447         |
|     M2M100ForConditionalGeneration      | 16  | 111.0179 | 146.7441  | 74.2379  |        80.0816         |
|     PLBartForConditionalGeneration      |  4  | 117.7678 | 123.0287  | 73.8226  |        72.8534         |
|          MobileBertForMaskedLM          | 64  | 178.8729 | 216.6143  | 72.7713  |        157.5865        |
|           DebertaForMaskedLM            |  4  | 91.4828  | 121.2082  |  72.538  |        81.1988         |
|           LayoutLMForMaskedLM           | 16  | 114.1535 | 116.9511  | 71.6602  |        70.6035         |
|            PLBartForCausalLM            |  8  | 116.1623 | 116.5458  | 71.5479  |         69.701         |
|     DistilBertForQuestionAnswering      | 256 | 103.8103 | 104.4354  | 71.4809  |        71.3692         |
|          DistilBertForMaskedLM          | 128 | 85.2319  |  88.9296  | 70.0282  |        68.6087         |
|             OPTForCausalLM              |  2  | 169.899  | 179.5859  | 69.8828  |        68.8875         |
|             BertForMaskedLM             | 16  | 111.629  | 114.3172  | 68.9918  |        69.4175         |
|           RobertaForCausalLM            | 16  | 116.5493 | 119.9019  | 68.5151  |        69.0233         |
|       T5ForConditionalGeneration        |  4  | 106.3579 | 123.1266  | 64.1644  |        60.3979         |
|                 T5Small                 |  4  | 106.2552 | 122.4992  | 64.0652  |        60.4382         |
|               DistillGPT2               | 16  | 107.6102 | 110.5271  | 63.6991  |        62.2006         |
|           PegasusForCausalLM            | 32  | 71.9318  |  78.2926  |  59.751  |        64.5562         |
|         MegatronBertForCausalLM         |  4  | 88.6621  |  94.424   | 59.4416  |        58.0695         |
|             XGLMForCausalLM             |  8  | 98.0715  | 122.2685  |  55.285  |         90.595         |
|    LayoutLMForSequenceClassification    | 16  | 99.2365  | 100.4756  | 54.6127  |        55.1001         |
|       ElectraForQuestionAnswering       | 64  | 116.1399 | 117.1327  | 54.1618  |        54.9351         |
|        BertForQuestionAnswering         | 16  | 96.6794  |  98.0057  | 53.8401  |        54.1571         |
|       RobertaForQuestionAnswering       | 16  | 97.0986  |  99.297   | 53.5143  |        54.1593         |
|           ElectraForCausalLM            | 32  | 89.5447  |  94.0807  | 49.5284  |        48.4256         |
|       BlenderbotSmallForCausalLM        | 64  | 59.6281  |  64.669   | 47.7392  |        48.9322         |
|       MT5ForConditionalGeneration       | 16  | 93.4177  | 110.2095  | 44.0802  |        50.0289         |
|      GPT2ForSequenceClassification      |  4  | 93.8162  |  96.0779  | 40.6198  |        40.1281         |
|         Speech2Text2ForCausalLM         | 256 | 53.9524  |  56.1812  | 36.5487  |         34.832         |
|          BlenderbotForCausalLM          |  4  | 110.4447 | 132.9703  |   nan    |        107.8312        |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9985 |   0.997   |  3.0087  |         2.9694         |
|      xcit_large_24_p8_224       |  5  | 0.9901 |   0.868   |  1.9642  |         1.5826         |
|         coat_lite_mini          | 128 | 0.9973 |  0.9958   |  1.9427  |         1.9157         |
|        twins_pcpvt_base         | 64  | 0.9959 |  0.9229   |  1.9414  |         1.6819         |
|          ghostnet_100           | 128 | 0.9922 |  0.7546   |  1.845   |         1.6153         |
|          gmlp_s16_224           | 128 | 0.9945 |   1.083   |  1.8424  |         1.8318         |
|          gmixer_24_224          | 128 | 0.9955 |  0.8891   |  1.7589  |         1.7499         |
|           volo_d1_224           | 64  | 0.9941 |  0.9734   |  1.6899  |         1.667          |
|            lcnet_050            | 128 | 0.9401 |  0.7357   |  1.686   |         1.4771         |
|         crossvit_9_240          | 128 | 0.9901 |  0.7827   |  1.6428  |         1.6157         |
|  swin_base_patch4_window7_224   | 64  | 0.9908 |  0.9544   |  1.6182  |         1.6053         |
|           convit_base           | 64  | 0.9982 |  0.9979   |  1.6122  |         1.6117         |
|          inception_v3           | 128 | 0.9964 |  0.8641   |  1.5299  |         1.5192         |
|       gluon_inception_v3        | 128 | 0.9963 |  0.8654   |  1.5295  |         1.5198         |
|        adv_inception_v3         | 128 | 0.9963 |  0.8603   |  1.529   |         1.5191         |
|             dla102              | 128 | 0.9956 |  0.8152   |  1.5236  |         1.5226         |
|        sebotnet33ts_256         | 64  | 0.9582 |   0.765   |  1.5084  |         1.5354         |
|          convnext_base          | 64  | 0.9837 |  0.9852   |  1.4897  |         1.4713         |
|            nfnet_l0             | 128 | 0.9897 |  0.8136   |  1.4862  |         1.4349         |
|           dm_nfnet_f0           | 128 | 0.9875 |  0.9848   |  1.4811  |         1.4291         |
|       eca_botnext26ts_256       | 128 | 0.973  |  0.7195   |  1.4458  |         1.4253         |
|           mnasnet_100           | 128 | 0.9483 |  0.7413   |  1.4378  |         1.496          |
|            pit_b_224            | 64  | 0.9946 |  0.9925   |  1.4345  |         1.4284         |
|      mobilenetv3_large_100      | 128 | 0.9497 |  0.7599   |  1.4302  |         1.4199         |
|           mobilevit_s           | 64  | 0.9621 |  0.7265   |  1.4291  |         1.4409         |
|           resnest101e           | 64  | 0.9944 |  0.8674   |  1.4244  |         1.3578         |
|           selecsls42b           | 128 | 0.9987 |  0.8119   |  1.4117  |         1.4123         |
|          botnet26t_256          | 128 | 0.9718 |  0.8515   |  1.4081  |         1.4207         |
|           regnety_002           | 128 | 0.9524 |  0.7181   |  1.401   |         1.234          |
|         mobilenetv2_100         | 128 | 0.9501 |  0.7374   |  1.3865  |         1.4455         |
|        res2net50_14w_8s         | 128 | 0.9989 |  0.7899   |  1.3808  |         1.3575         |
|           res2next50            | 128 | 0.9992 |  0.8255   |  1.3712  |         1.3621         |
|          jx_nest_base           | 32  | 0.9873 |   0.985   |  1.3634  |         1.3581         |
|          mixer_b16_224          | 128 | 0.9978 |  1.0186   |  1.3628  |         1.3618         |
|        ese_vovnet19b_dw         | 128 | 0.9578 |   0.832   |  1.3561  |         1.3699         |
|          spnasnet_100           | 128 | 0.9417 |  0.7386   |  1.355   |         1.4197         |
|           fbnetc_100            | 128 |  0.95  |  0.7394   |  1.3537  |         1.4044         |
|       tf_efficientnet_b0        | 128 | 0.9597 |  0.6812   |  1.3532  |         1.3842         |
|          cait_m36_384           |  4  | 0.9948 |  0.9932   |  1.3513  |         1.348          |
|      beit_base_patch16_224      | 64  | 0.9966 |  0.9657   |  1.3492  |         1.3523         |
|            hrnet_w18            | 128 | 0.9925 |  0.6336   |  1.3488  |         1.3499         |
|         poolformer_m36          | 64  | 0.9866 |  0.9839   |  1.3291  |         1.3183         |
|            fbnetv3_b            | 128 | 0.9492 |  0.7691   |  1.3127  |         1.3297         |
|           rexnet_100            | 128 | 0.9529 |  0.7024   |  1.301   |         1.3309         |
|          resmlp_12_224          | 128 | 0.9931 |  0.8899   |  1.2589  |         1.2559         |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9938   |  1.2565  |         1.2566         |
|      vit_base_patch16_224       | 64  | 0.9962 |  0.9938   |  1.2358  |         1.2346         |
|            tinynet_a            | 128 | 0.9469 |  0.6782   |  1.228   |         1.2618         |
|          cspdarknet53           | 64  | 0.9329 |  0.7867   |  1.2247  |         1.2629         |
|           tf_mixnet_l           | 128 | 0.9764 |  0.8269   |  1.1848  |         1.1913         |
|            mixnet_l             | 128 | 0.9764 |  0.8213   |  1.1737  |         1.1817         |
|         visformer_small         | 128 | 0.9963 |  0.9453   |  1.1731  |         1.1672         |
|        res2net101_26w_4s        | 64  | 1.001  |  0.7935   |  1.1494  |         1.0937         |
|          pnasnet5large          | 16  | 0.9854 |  0.9112   |  1.112   |         1.1281         |
|             dpn107              | 32  | 0.9323 |  0.8078   |  1.0894  |         1.1384         |
|            repvgg_a2            | 128 | 0.9361 |  0.7544   |  1.0887  |         1.1194         |
|        gluon_xception65         | 32  | 0.9925 |  0.8414   |  1.0752  |         1.0793         |
|     swsl_resnext101_32x16d      | 32  | 0.9978 |  0.8427   |  1.0602  |         1.0254         |
|            gernet_l             | 128 | 0.9354 |  0.7934   |  1.0383  |         1.0675         |
|        convmixer_768_32         | 32  | 0.9987 |  0.9637   |  1.0017  |         1.0021         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.6732  |  11.1812  | 298.096  |        277.4582        |
|            hrnet_w18            | 128 | 9.6771  |  36.1068  | 245.3984 |        246.8611        |
|          ghostnet_100           | 128 | 7.6094  |  15.0671  | 232.9816 |        241.7617        |
|            fbnetv3_b            | 128 | 8.5019  |  16.7722  | 173.4663 |        175.3756        |
|           resnest101e           | 64  | 11.2167 |   24.3    | 166.1121 |        165.7984        |
|          pnasnet5large          | 16  | 8.2672  |  26.047   | 164.7778 |        159.1506        |
|            tinynet_a            | 128 |  5.994  |  12.2753  | 162.2366 |        163.014         |
|           mobilevit_s           | 64  | 5.3645  |  11.3762  | 161.1674 |        158.7045        |
|       gluon_inception_v3        | 128 | 5.6815  |  12.4855  | 159.8668 |        159.2567        |
|          inception_v3           | 128 | 5.6702  |  12.5557  | 159.3256 |        160.0972        |
|            mixnet_l             | 128 | 8.3352  |  16.1158  | 157.2724 |        159.8059        |
|      mobilenetv3_large_100      | 128 | 4.2265  |  8.3316   | 157.134  |        154.4084        |
|        adv_inception_v3         | 128 | 5.6817  |  12.5679  | 156.2602 |        159.5745        |
|       tf_efficientnet_b0        | 128 | 5.1081  |  10.2549  | 154.115  |        147.0983        |
|        res2net101_26w_4s        | 64  | 10.5771 |  24.5555  | 150.8055 |        148.9207        |
|           tf_mixnet_l           | 128 | 9.0116  |  16.9663  | 150.4952 |        160.1388        |
|        twins_pcpvt_base         | 64  | 10.4114 |  23.6483  | 146.2708 |        147.2427        |
|           fbnetc_100            | 128 | 5.1185  |  9.2265   | 139.1197 |        134.727         |
|          spnasnet_100           | 128 | 4.9891  |  9.2774   | 136.5761 |        134.8972        |
|      xcit_large_24_p8_224       |  5  | 12.4568 |  28.0799  | 129.0758 |        131.1981        |
|         mobilenetv2_100         | 128 | 4.0168  |  7.9808   | 128.7111 |        127.9593        |
|           mnasnet_100           | 128 | 3.9907  |  7.5723   | 126.3795 |        119.7307        |
|        res2net50_14w_8s         | 128 | 8.9968  |  22.3546  | 122.1861 |        123.4464        |
|          cait_m36_384           |  4  | 13.657  |  30.4376  | 113.9863 |        113.4558        |
|        sebotnet33ts_256         | 64  | 4.2369  |  8.7978   | 107.1643 |        106.4453        |
|  swin_base_patch4_window7_224   | 64  |  8.484  |  19.2542  | 106.4718 |        105.1625        |
|           regnety_002           | 128 | 4.8505  |  8.6816   | 105.5029 |        102.9592        |
|         poolformer_m36          | 64  | 7.5534  |  13.6265  | 100.7767 |        99.1759         |
|            lcnet_050            | 128 | 2.5358  |   4.999   | 99.4203  |        98.8536         |
|          cspdarknet53           | 64  | 5.9045  |  10.9003  | 99.1059  |        98.5324         |
|             dpn107              | 32  | 9.5924  |  19.6269  | 98.2193  |        98.6852         |
|             dla102              | 128 | 6.2652  |  14.0837  | 97.1696  |         96.027         |
|        gluon_xception65         | 32  | 7.8542  |  17.0376  | 94.8665  |        95.8324         |
|       eca_botnext26ts_256       | 128 | 3.1178  |  6.7781   | 93.8433  |        93.3662         |
|           selecsls42b           | 128 |  2.483  |  5.3097   | 92.1884  |        87.7552         |
|          botnet26t_256          | 128 | 2.9933  |  6.0494   | 91.6183  |        91.0272         |
|         coat_lite_mini          | 128 | 3.2429  |  7.9026   | 89.1004  |        88.5904         |
|         crossvit_9_240          | 128 | 5.7737  |  13.3314  | 86.0591  |         86.03          |
|           res2next50            | 128 | 5.0214  |  11.9675  | 85.8711  |         84.749         |
|          jx_nest_base           | 32  | 6.6413  |  14.6343  | 84.6803  |        82.8938         |
|            gernet_l             | 128 | 5.0526  |  8.8422   | 81.7305  |        79.6548         |
|            nfnet_l0             | 128 | 5.3329  |  10.8422  | 77.3089  |        77.0859         |
|        ese_vovnet19b_dw         | 128 | 2.5416  |  4.5353   | 76.8853  |        76.0118         |
|           dm_nfnet_f0           | 128 | 5.9808  |  11.4912  | 73.8427  |        70.5974         |
|           volo_d1_224           | 64  | 5.0084  |  11.6424  |  72.559  |        72.6661         |
|        tnt_s_patch16_224        | 128 | 6.5284  |  15.9731  | 67.7744  |        67.2014         |
|         visformer_small         | 128 | 2.5731  |  5.9491   | 66.5406  |        65.0591         |
|     swsl_resnext101_32x16d      | 32  |  6.183  |  13.6516  | 61.6238  |        60.7949         |
|          gmlp_s16_224           | 128 | 5.6343  |  11.971   | 59.5927  |        57.5152         |
|            repvgg_a2            | 128 | 4.9514  |   8.675   | 59.3119  |        58.4453         |
|          convnext_base          | 64  | 7.3616  |  12.3789  | 58.5814  |         58.03          |
|          gmixer_24_224          | 128 |  5.804  |  12.852   | 51.4533  |        49.9019         |
|           convit_base           | 64  | 3.5096  |  8.5733   | 45.5288  |         46.782         |
|            pit_b_224            | 64  | 3.6676  |  8.0382   | 44.3109  |        43.6488         |
| deit_base_distilled_patch16_224 | 64  | 3.1218  |  7.0981   | 41.0819  |        41.4334         |
|          resmlp_12_224          | 128 | 2.8099  |  5.3715   | 38.2656  |        38.2605         |
|      vit_base_patch16_224       | 64  | 3.0405  |  6.9604   | 38.0055  |        37.9131         |
|        convmixer_768_32         | 32  | 1.6906  |  6.8854   | 37.7089  |        36.6506         |
|      beit_base_patch16_224      | 64  |  3.862  |  8.7231   | 36.2484  |        35.3138         |
|          mixer_b16_224          | 128 | 2.7022  |  5.8221   | 32.4219  |         31.735         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9634 |  0.9151   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.4256 |  311.557  | 300.0862 |        299.9383        |
|            hrnet_w18            | 128 | 281.2279 | 441.5232  | 207.358  |        206.9119        |
|          pnasnet5large          | 16  | 199.2119 | 214.9838  | 176.3123 |        174.0971        |
|           tf_mixnet_l           | 128 | 193.8375 |  228.795  | 160.0863 |        159.033         |
|            mixnet_l             | 128 | 185.3428 | 220.4322  | 154.0928 |        153.1313        |
|          cait_m36_384           |  4  | 168.2102 |  167.757  | 123.6316 |        123.7572        |
|           resnest101e           | 64  | 165.5363 |  188.552  | 115.1361 |        120.4433        |
|             dla102              | 128 | 172.358  | 210.6913  | 112.812  |        112.8646        |
|     swsl_resnext101_32x16d      | 32  | 118.6671 | 140.5217  | 111.9311 |        115.4713        |
|         poolformer_m36          | 64  | 146.7595 | 147.1212  | 108.9741 |        109.8066        |
|        tnt_s_patch16_224        | 128 | 323.6315 | 323.8425  | 107.2582 |        108.7573        |
|       gluon_inception_v3        | 128 | 160.7171 | 185.1465  | 104.8951 |        105.5333        |
|          inception_v3           | 128 | 161.0217 | 185.3002  | 104.8513 |        105.5696        |
|        adv_inception_v3         | 128 | 160.7518 | 186.0528  | 104.8013 |        105.3749        |
|        res2net50_14w_8s         | 128 | 140.9133 | 178.1508  | 101.7573 |        103.5965        |
|           convit_base           | 64  |  163.19  | 163.1075  | 100.9791 |        100.9463        |
|             dpn107              | 32  | 113.9096 | 131.4352  | 97.2871  |        93.1528         |
|        gluon_xception65         | 32  | 99.6462  | 118.0105  | 91.9727  |        91.6621         |
|           res2next50            | 128 | 125.9188 |  152.752  |  91.757  |        92.4747         |
|  swin_base_patch4_window7_224   | 64  | 147.7909 | 153.1246  | 90.3268  |        90.8919         |
|          mixer_b16_224          | 128 | 116.5625 | 114.1752  | 85.7432  |        85.3706         |
|           dm_nfnet_f0           | 128 | 128.2494 | 128.4786  | 85.6587  |        88.8372         |
|        res2net101_26w_4s        | 64  | 99.8603  | 125.1598  | 84.9988  |        90.2648         |
|            fbnetv3_b            | 128 | 115.3747 |  142.078  | 83.4328  |        82.2934         |
|            pit_b_224            | 64  | 118.7671 | 118.9212  | 82.3286  |        82.6769         |
|          convnext_base          | 64  | 124.5889 | 124.0946  |  82.198  |        83.0801         |
|         visformer_small         | 128 | 91.2759  |  96.0794  | 77.5772  |         77.837         |
|            nfnet_l0             | 128 | 113.1759 | 137.3369  | 75.1463  |        78.0774         |
|      beit_base_patch16_224      | 64  | 101.5457 | 104.6874  | 75.0329  |        74.8422         |
|          gmlp_s16_224           | 128 | 137.5476 | 126.3641  | 74.3377  |        74.6788         |
|          jx_nest_base           | 32  | 101.9809 |  101.634  | 73.6178  |        73.7904         |
|       eca_botnext26ts_256       | 128 | 108.9034 | 147.2464  | 73.3232  |        74.2039         |
|          cspdarknet53           | 64  |  95.127  |  112.462  | 72.3569  |        70.2068         |
|           volo_d1_224           | 64  | 121.1456 | 123.5536  | 71.2867  |        72.1744         |
|          botnet26t_256          | 128 | 102.1146 |  116.555  | 70.4302  |        69.8015         |
|      vit_base_patch16_224       | 64  | 86.8538  |  87.0979  | 70.0679  |        70.1263         |
|            gernet_l             | 128 | 77.6212  |  91.6608  | 70.0478  |        68.1321         |
| deit_base_distilled_patch16_224 | 64  | 84.8844  |  85.0205  | 67.4089  |        67.1897         |
|          gmixer_24_224          | 128 | 118.281  | 132.3082  | 67.0198  |        67.1503         |
|            repvgg_a2            | 128 | 77.7462  |  96.2509  | 66.7392  |        64.8866         |
|      xcit_large_24_p8_224       |  5  | 121.664  | 138.2681  | 62.4653  |        77.4349         |
|        twins_pcpvt_base         | 64  | 127.5747 | 129.9484  | 60.1536  |        68.0617         |
|       tf_efficientnet_b0        | 128 | 84.8382  | 119.5788  | 60.1157  |        58.8553         |
|           rexnet_100            | 128 | 80.2635  | 108.7896  | 58.5928  |        57.3355         |
|           fbnetc_100            | 128 | 82.8114  | 106.4631  | 58.0829  |        55.9784         |
|         coat_lite_mini          | 128 | 113.1805 | 113.2038  | 57.9922  |        58.8506         |
|           mobilevit_s           | 64  | 84.7591  | 112.1775  | 56.9535  |        56.5117         |
|            tinynet_a            | 128 | 73.7949  | 102.7632  | 56.6444  |        55.1491         |
|        sebotnet33ts_256         | 64  | 80.6667  | 100.5506  | 51.1062  |        50.2347         |
|         crossvit_9_240          | 128 | 82.5189  |  104.458  | 49.7934  |        50.5198         |
|          spnasnet_100           | 128 | 70.3219  |  89.6853  |  48.899  |        46.7705         |
|          ghostnet_100           | 128 | 90.6779  |  119.275  | 48.6651  |        55.6949         |
|        ese_vovnet19b_dw         | 128 | 64.5939  |  74.4799  | 45.6857  |        45.1808         |
|         mobilenetv2_100         | 128 | 65.4113  |  84.4255  | 44.8639  |        43.0139         |
|           selecsls42b           | 128 | 60.0618  |  73.8429  | 42.5158  |        42.5057         |
|           mnasnet_100           | 128 | 64.3916  |  82.4213  | 42.3994  |         40.731         |
|          resmlp_12_224          | 128 | 53.5035  |  59.8463  | 42.1821  |        42.2518         |
|      mobilenetv3_large_100      | 128 | 61.3903  |  76.7415  | 40.6437  |        40.9734         |
|           regnety_002           | 128 | 42.8266  |  53.8018  | 26.4855  |        30.5461         |
|            lcnet_050            | 128 | 31.7876  |  40.5996  | 17.6719  |        20.2013         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/torchbench_amp.png :

Build Summary

see more

Run name

day_087_28_03_23_performance_amp_574

Commit hashes

pytorch commit: f754be8
pytorch commit date: 2023-03-29 01:27:31+00:00
torchbench commit: 4b0a89a81c808bcfe7576c874c3ab2accc7ba378
torchbench commit date: 2023-03-28 08:46:42-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitf754be8

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 87%, 52/60 | 93%, 42/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 88%, 53/60 | 98%, 44/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.58x    |    1.57x    |    1.41x    |
| inductor_no_cudagraphs |   1.27x    |    1.48x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.86    |    7.61     |    5.90     |
|       aot_eager        |    9.44    |    16.41    |    13.21    |
|        inductor        |   63.85    |    64.00    |   109.24    |
| inductor_no_cudagraphs |   63.65    |    59.43    |   108.48    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.79x    |    0.89x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.03x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Previous report name: /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 85%, 51/60  | 87%, 52/60  |
|        inductor        | huggingface | 93%, 42/45  | 93%, 42/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 88%, 53/60  |
| inductor_no_cudagraphs | huggingface | 98%, 44/45  | 98%, 44/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.56x    |   1.58x   |
|        inductor        | huggingface |   1.59x    |   1.57x   |
|        inductor        | timm_models |   1.41x    |   1.41x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.48x    |   1.48x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+------------------------+-----------------+
|    suite    |             name              | inductor_no_cudagraphs |    inductor     |
+-------------+-------------------------------+------------------------+-----------------+
| torchbench  |             moco              |      fail_to_run       |   fail_to_run   |
| torchbench  |      Background_Matting       |    eager_variation     | eager_variation |
| torchbench  |        vision_maskrcnn        |    eager_variation     | eager_variation |
| torchbench  |           tacotron2           |         0.0000         |     0.0000      |
| torchbench  |              gat              |         0.0000         |     0.0000      |
| torchbench  |              gcn              |         0.0000         |     0.0000      |
| torchbench  |             llama             |         0.0000         |     0.0000      |
| torchbench  |             sage              |         0.0000         |     0.0000      |
| torchbench  |         torchrec_dlrm         |         0.0000         |     0.0000      |
| huggingface | DebertaV2ForQuestionAnswering |          pass          |   fail_to_run   |
| huggingface |  AlbertForQuestionAnswering   |     fail_accuracy      |  fail_accuracy  |
+-------------+-------------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+-------------------------------+------------------------+----------+
|    suite    |             name              | inductor_no_cudagraphs | inductor |
+-------------+-------------------------------+------------------------+----------+
| torchbench  |             dcgan             |         0.8364         |  1.4618  |
| torchbench  |         lennard_jones         |         0.8847         |  1.4021  |
| torchbench  |       soft_actor_critic       |         0.6943         |  1.2102  |
| torchbench  |          timm_vovnet          |         0.9327         |  0.9433  |
| torchbench  |    nvidia_deeprecommender     |         1.0198         |  0.8732  |
| torchbench  | timm_vision_transformer_large |         1.0804         |   0.0    |
| torchbench  |             moco              |          0.0           |   0.0    |
| torchbench  |              gat              |          0.0           |   0.0    |
| torchbench  |              gcn              |          0.0           |   0.0    |
| torchbench  |             sage              |          0.0           |   0.0    |
| torchbench  |           tacotron2           |          0.0           |   0.0    |
| torchbench  |         torchrec_dlrm         |          0.0           |   0.0    |
| huggingface |  DebertaForQuestionAnswering  |         0.9108         |  1.0112  |
| huggingface |      DebertaForMaskedLM       |         0.7893         |  0.9351  |
| huggingface |     DebertaV2ForMaskedLM      |         0.624          |  0.8425  |
| huggingface | DebertaV2ForQuestionAnswering |         0.633          |  0.795   |
| huggingface |     BlenderbotForCausalLM     |         1.0827         |   0.0    |
+-------------+-------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |          hf_T5_large           |        170.6535        | 172.7065 |
| torchbench  |        phlippe_densenet        |        166.4526        | 163.1861 |
| torchbench  |         hf_Longformer          |        117.3756        | 151.4083 |
| torchbench  |       timm_efficientnet        |        146.5963        | 144.5115 |
| torchbench  |           hf_BigBird           |        123.6819        | 143.1175 |
| torchbench  |          densenet121           |        135.3702        | 133.3355 |
| torchbench  |          mobilenet_v2          |        125.9691        | 130.3389 |
| torchbench  |       mobilenet_v3_large       |        133.6329        | 129.3707 |
| torchbench  | timm_vision_transformer_large  |        123.5278        |   nan    |
| huggingface |     AllenaiLongformerBase      |        117.2945        | 152.1659 |
| huggingface |     MobileBertForMaskedLM      |        143.3702        | 143.8201 |
| huggingface | MobileBertForQuestionAnswering |        142.0905        | 136.9777 |
| huggingface |      DebertaV2ForMaskedLM      |        70.1598         | 134.7087 |
| huggingface | DebertaV2ForQuestionAnswering  |        67.4951         | 133.5676 |
| huggingface |  MT5ForConditionalGeneration   |        129.5675        | 132.4141 |
| timm_models |           rexnet_100           |        292.771         | 279.3221 |
| timm_models |           hrnet_w18            |        244.4327        | 248.2146 |
| timm_models |          ghostnet_100          |        234.8726        | 234.1809 |
| timm_models |           fbnetv3_b            |        168.8537        | 170.7563 |
| timm_models |          resnest101e           |        167.7274        | 166.3095 |
| timm_models |         pnasnet5large          |        162.246         | 165.1175 |
| timm_models |          mobilevit_s           |        156.2809        | 164.1183 |
| timm_models |     mobilenetv3_large_100      |        157.1416        | 163.0647 |
| timm_models |            mixnet_l            |        158.9558        | 160.9964 |
| timm_models |       gluon_inception_v3       |        154.5762        | 160.4095 |
| timm_models |          tf_mixnet_l           |        160.0615        | 160.0163 |
| timm_models |          inception_v3          |        161.5416        | 158.9818 |
| timm_models |           tinynet_a            |        162.5123        | 156.7662 |
| timm_models |        adv_inception_v3        |        156.1085        | 156.1849 |
| timm_models |       tf_efficientnet_b0       |        149.2906        | 154.9265 |
| timm_models |       res2net101_26w_4s        |        153.3665        | 154.3769 |
| timm_models |        twins_pcpvt_base        |        148.3567        | 149.0719 |
| timm_models |           fbnetc_100           |        136.4138        | 139.8576 |
| timm_models |          spnasnet_100          |        133.2336        | 135.6297 |
| timm_models |        mobilenetv2_100         |        125.0794        | 135.5622 |
| timm_models |      xcit_large_24_p8_224      |        132.8271        | 132.1645 |
| timm_models |        res2net50_14w_8s        |        122.7576        | 122.8699 |
| timm_models |          mnasnet_100           |        125.1403        | 120.651  |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |              hf_GPT2_large              |         1.1284         |  0.8906  |
| torchbench  |            timm_efficientnet            |         0.9412         |  0.8706  |
| torchbench  |                 yolov3                  |         1.0372         |   0.87   |
| torchbench  |           speech_transformer            |         0.869          |  0.8651  |
| torchbench  |              timm_resnest               |         0.9665         |  0.8624  |
| torchbench  |           shufflenet_v2_x1_0            |         0.9594         |  0.8618  |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8593  |
| torchbench  |               timm_regnet               |         0.9502         |  0.8508  |
| torchbench  |                resnet152                |         0.9412         |  0.8499  |
| torchbench  |           Background_Matting            |         1.0403         |  0.8484  |
| torchbench  |              hf_DistilBert              |         0.9479         |  0.8476  |
| torchbench  |               hf_T5_large               |         1.168          |  0.8201  |
| torchbench  |              pytorch_unet               |         0.9308         |  0.8134  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8058  |
| torchbench  |           mobilenet_v3_large            |         0.8709         |  0.7843  |
| torchbench  |                  dcgan                  |         0.9645         |  0.7821  |
| torchbench  |                resnet50                 |         0.8866         |  0.7815  |
| torchbench  |                 demucs                  |         0.9662         |  0.7734  |
| torchbench  |              squeezenet1_1              |         0.9087         |  0.773   |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.7715  |
| torchbench  |                 hf_Bart                 |         0.9285         |  0.7535  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.7529  |
| torchbench  |             pytorch_struct              |         0.7358         |  0.7274  |
| torchbench  |                  vgg16                  |         0.9805         |  0.7227  |
| torchbench  |               mnasnet1_0                |         0.8067         |  0.7159  |
| torchbench  |                 alexnet                 |         0.9385         |  0.7088  |
| torchbench  |               densenet121               |         0.803          |  0.7085  |
| torchbench  |               hf_BigBird                |         1.1068         |  0.6971  |
| torchbench  |             resnext50_32x4d             |         0.7703         |  0.6667  |
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.6585  |
| torchbench  |                   drq                   |         0.9573         |  0.6379  |
| torchbench  |            soft_actor_critic            |         0.9973         |  0.6066  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.5925  |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6004         |  0.5904  |
| torchbench  |                resnet18                 |         0.6127         |  0.5423  |
| torchbench  |              lennard_jones              |         0.9997         |  0.5317  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.4538  |
| torchbench  |              hf_Longformer              |         0.8947         |  0.417   |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.3991  |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3169  |
| huggingface |            PLBartForCausalLM            |         0.9249         |  0.8907  |
| huggingface |     PegasusForConditionalGeneration     |         1.0074         |  0.8901  |
| huggingface |           ElectraForCausalLM            |         0.8941         |  0.889   |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8849  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8729  |
| huggingface |      MBartForConditionalGeneration      |         1.0307         |  0.8672  |
| huggingface |            TrOCRForCausalLM             |         0.9075         |  0.8619  |
| huggingface |            MBartForCausalLM             |         0.9507         |  0.8491  |
| huggingface |      BartForConditionalGeneration       |         1.0139         |  0.8456  |
| huggingface |         MegatronBertForCausalLM         |         1.0962         |  0.845   |
| huggingface |             BartForCausalLM             |         0.943          |  0.8301  |
| huggingface |       BlenderbotSmallForCausalLM        |         0.8318         |  0.8065  |
| huggingface |           PegasusForCausalLM            |         0.9252         |  0.7952  |
| huggingface |         Speech2Text2ForCausalLM         |         0.808          |  0.7566  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.7473  |
| huggingface |     M2M100ForConditionalGeneration      |         0.9535         |  0.7188  |
| huggingface |             XGLMForCausalLM             |         0.9287         |  0.6744  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6569  |
| huggingface |           DebertaForMaskedLM            |         0.9978         |  0.5501  |
| huggingface |          DebertaV2ForMaskedLM           |         0.9665         |  0.5197  |
| huggingface |      DebertaV2ForQuestionAnswering      |         0.9802         |  0.487   |
| huggingface |          AllenaiLongformerBase          |         0.8742         |  0.4688  |
| huggingface |       DebertaForQuestionAnswering       |         1.1527         |  0.4601  |
| timm_models |                hrnet_w18                |          0.99          |  0.8918  |
| timm_models |            sebotnet33ts_256             |         1.1115         |  0.891   |
| timm_models |              inception_v3               |         1.0171         |  0.8904  |
| timm_models |           gluon_inception_v3            |         1.0171         |  0.8904  |
| timm_models |            adv_inception_v3             |         1.0171         |  0.8904  |
| timm_models |                 dpn107                  |         0.9642         |  0.8833  |
| timm_models |            gluon_xception65             |         0.9705         |  0.8831  |
| timm_models |              ghostnet_100               |         0.977          |  0.8807  |
| timm_models |              spnasnet_100               |         0.9451         |  0.8786  |
| timm_models |          mobilenetv3_large_100          |         0.9361         |  0.877   |
| timm_models |             poolformer_m36              |         1.1871         |  0.8768  |
| timm_models |           eca_botnext26ts_256           |         1.0072         |  0.8738  |
| timm_models |          xcit_large_24_p8_224           |         0.9732         |  0.8721  |
| timm_models |            res2net50_14w_8s             |         0.9607         |  0.8712  |
| timm_models |            res2net101_26w_4s            |         0.9483         |  0.871   |
| timm_models |                mixnet_l                 |         0.9902         |  0.8687  |
| timm_models |               mnasnet_100               |         0.9403         |  0.8683  |
| timm_models |               res2next50                |         0.9547         |  0.866   |
| timm_models |              cait_m36_384               |         0.989          |  0.8632  |
| timm_models |               fbnetc_100                |         0.9535         |  0.8596  |
| timm_models |                pit_b_224                |         1.0242         |  0.8578  |
| timm_models |               selecsls42b               |         0.9664         |  0.8576  |
| timm_models |              convnext_base              |         1.0338         |  0.8505  |
| timm_models |                gernet_l                 |         0.9706         |  0.8499  |
| timm_models |         swsl_resnext101_32x16d          |         0.9786         |  0.8461  |
| timm_models |             coat_lite_mini              |         1.0202         |  0.8402  |
| timm_models |              botnet26t_256              |         0.9779         |  0.8239  |
| timm_models |                lcnet_050                |         0.884          |  0.805   |
| timm_models |                repvgg_a2                |         0.9611         |  0.7738  |
| timm_models |               regnety_002               |         0.8966         |  0.7602  |
| timm_models |             crossvit_9_240              |         0.9898         |  0.7526  |
| timm_models |      swin_base_patch4_window7_224       |         0.9045         |  0.7214  |
| timm_models |              jx_nest_base               |         0.9604         |  0.6693  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/geomean_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/passrate_over_time.png :

bench_logs/memory_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

No regressions found.

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Performance speedup regressions

+----------+--------------------+-------------+------------+
| compiler |        name        | prev_status | cur_status |
+----------+--------------------+-------------+------------+
| inductor | DebertaForMaskedLM |   0.9725    |   0.9351   |
+----------+--------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574

Compilation latency (sec) regressions

+------------------------+-------------+-------------+------------+
|        compiler        |    name     | prev_status | cur_status |
+------------------------+-------------+-------------+------------+
| inductor_no_cudagraphs | mnasnet_100 |  119.7307   |  125.1403  |
+------------------------+-------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9665 |  0.9137   |  3.5785  |         1.3729         |
|           BERT_pytorch            |  16  | 0.9909 |  0.8217   |  3.0437  |         2.0824         |
|            densenet121            |  4   | 0.985  |  0.7153   |  2.7963  |         1.0476         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9872 |  0.9358   |  2.5856  |         1.7663         |
|            hf_BigBird             |  2   | 0.9615 |  0.7779   |  2.4758  |         1.6012         |
|             hf_Albert             |  8   | 0.9958 |  0.9606   |  2.3262  |         2.2848         |
|            hf_T5_large            |  2   | 0.9749 |  0.8014   |  2.2138  |         1.8546         |
|              hf_Bart              |  4   | 0.9711 |  0.7709   |  2.2049  |         1.4213         |
|        mobilenet_v3_large         |  32  | 0.9963 |  0.7862   |  2.0697  |         1.1878         |
|         phlippe_densenet          | 128  | 0.9866 |   0.771   |  2.0516  |         1.017          |
|           squeezenet1_1           |  32  | 0.981  |  0.9357   |  1.988   |         1.2971         |
|               dlrm                | 1024 | 0.9327 |  0.8443   |  1.9481  |         1.1668         |
|               hf_T5               |  8   | 0.985  |  0.8489   |  1.8968  |         1.9469         |
|          phlippe_resnet           | 128  | 0.9832 |  0.7666   |  1.8458  |         1.0074         |
|              hf_Bert              |  4   | 0.998  |  0.8458   |  1.7947  |         1.5818         |
|              hf_GPT2              |  4   | 0.9944 |  0.9567   |  1.7533  |         1.7644         |
|          resnext50_32x4d          |  8   | 0.9825 |  0.7208   |  1.7188  |         0.9836         |
|            mnasnet1_0             |  32  | 0.9871 |   0.736   |  1.7075  |         1.0818         |
|           hf_GPT2_large           |  4   | 0.9826 |  0.9717   |  1.6705  |         1.7318         |
|        shufflenet_v2_x1_0         | 128  | 0.9943 |   0.752   |  1.6261  |         1.1988         |
|             resnet18              |  16  | 0.9919 |  0.7618   |  1.6092  |         0.9557         |
|        speech_transformer         |  32  | 0.9765 |  0.8263   |  1.608   |         1.6025         |
|           hf_Bert_large           |  4   | 0.9949 |  0.8725   |  1.5913  |         1.5476         |
|           timm_resnest            |  32  | 0.9916 |  0.8482   |  1.5626  |         1.5228         |
|                drq                |  1   | 0.9636 |   0.713   |  1.5508  |         1.0326         |
|           fastNLP_Bert            |  6   | 0.9927 |  0.8407   |  1.5479  |         1.4995         |
|      timm_vision_transformer      |  32  | 0.9804 |  0.8922   |  1.5392  |         1.375          |
|            timm_nfnet             | 128  | 0.9863 |   0.985   |  1.5333  |         1.4722         |
|           mobilenet_v2            |  96  | 0.9967 |  0.7759   |  1.518   |         1.5081         |
| attention_is_all_you_need_pytorch | 256  | 0.9902 |  0.9093   |  1.4725  |         1.4404         |
|               dcgan               |  32  | 0.8807 |  0.6903   |  1.4618  |         0.8364         |
|          pytorch_struct           | 200  | 0.9293 |  0.7702   |  1.4603  |         1.0886         |
|         timm_efficientnet         |  32  | 0.9381 |  0.6259   |  1.4422  |         1.0827         |
|           hf_Longformer           |  2   | 0.8271 |  0.5679   |  1.4374  |         1.2597         |
|           hf_DistilBert           |  8   | 0.9817 |  0.9593   |  1.4208  |         1.4567         |
|           lennard_jones           | 1000 | 0.8585 |  0.7216   |  1.4021  |         0.8847         |
|           pytorch_unet            |  1   | 0.9964 |  0.2039   |  1.3555  |         1.3502         |
|          LearningToPaint          |  96  | 0.9876 |  0.7755   |  1.3044  |         1.0649         |
|          pytorch_stargan          |  16  | 0.9939 |  0.8028   |  1.2819  |         1.2478         |
|               vgg16               |  64  | 0.9992 |  0.9986   |  1.2402  |         1.2526         |
|            Super_SloMo            |  6   | 0.996  |  0.1781   |  1.2326  |         1.2324         |
|        Background_Matting         |  4   | 0.999  |   0.136   |  1.2113  |         1.2084         |
|             resnet152             |  32  | 0.9941 |  0.7588   |  1.2112  |         1.0108         |
|         soft_actor_critic         | 256  | 0.8515 |  0.6168   |  1.2102  |         0.6943         |
|              yolov3               |  16  | 0.9962 |  0.8049   |  1.195   |         1.1968         |
|             resnet50              |  32  | 0.9935 |  0.7754   |  1.1826  |         1.0623         |
|            hf_Reformer            |  4   | 0.9866 |  0.9661   |  1.1423  |         1.0684         |
|              alexnet              | 128  | 0.9985 |  0.9976   |  1.0884  |         1.1351         |
|              demucs               |  4   | 0.9989 |  1.0003   |  1.035   |          1.04          |
|            timm_regnet            |  32  | 0.9189 |   0.769   |  0.9984  |         0.9679         |
|            tts_angular            |  64  | 0.9253 |   0.908   |  0.9654  |         0.9666         |
|            timm_vovnet            |  32  | 0.853  |  0.7118   |  0.9433  |         0.9327         |
|      nvidia_deeprecommender       | 256  | 0.9991 |  0.9978   |  0.8732  |         1.0198         |
|   timm_vision_transformer_large   |  32  | 0.998  |    0.0    |   0.0    |         1.0804         |
|               moco                |  32  | 0.9791 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 27.3531 |  56.4834  | 172.7065 |        170.6535        |
|         phlippe_densenet          | 128  | 3.2808  |  7.1356   | 163.1861 |        166.4526        |
|           hf_Longformer           |  2   | 11.6432 |  31.1766  | 151.4083 |        117.3756        |
|         timm_efficientnet         |  32  | 4.9633  |  10.1704  | 144.5115 |        146.5963        |
|            hf_BigBird             |  2   | 12.9344 |  37.5874  | 143.1175 |        123.6819        |
|            densenet121            |  4   | 7.7502  |  18.2364  | 133.3355 |        135.3702        |
|           mobilenet_v2            |  96  | 3.1611  |  7.1051   | 130.3389 |        125.9691        |
|        mobilenet_v3_large         |  32  | 3.4718  |  7.8067   | 129.3707 |        133.6329        |
|              yolov3               |  16  | 4.9954  |  10.7713  | 118.7303 |        119.7147        |
|            mnasnet1_0             |  32  | 3.1957  |  6.9167   | 107.7969 |        105.8011        |
|           hf_GPT2_large           |  4   | 15.0441 |  30.3311  | 105.8745 |        104.5749        |
|             resnet152             |  32  | 9.2109  |  20.3698  | 105.7132 |        104.9064        |
|           timm_resnest            |  32  | 1.8328  |  3.9031   | 100.6488 |        100.5622        |
|        shufflenet_v2_x1_0         | 128  | 3.5392  |  7.7711   |  79.839  |        81.3835         |
|        speech_transformer         |  32  | 6.2135  |  13.9099  | 76.0756  |        77.8851         |
| attention_is_all_you_need_pytorch | 256  | 4.5414  |  10.848   | 74.2674  |        73.5216         |
|            timm_nfnet             | 128  | 5.8821  |  11.1444  | 73.3395  |        72.7614         |
|            timm_regnet            |  32  | 6.7548  |  12.4855  | 73.3313  |        73.8514         |
|           BERT_pytorch            |  16  | 4.9944  |  11.7386  | 68.1768  |        67.7643         |
|        Background_Matting         |  4   | 3.0519  |  11.385   |  67.532  |        66.4917         |
|             resnet50              |  32  | 3.2619  |  7.1101   | 64.9864  |        64.4818         |
|           hf_Bert_large           |  4   | 10.3026 |  21.3935  | 63.0919  |        62.8211         |
|            timm_vovnet            |  32  | 3.6736  |  6.2949   |  62.398  |        62.9031         |
|              hf_Bart              |  4   | 10.6157 |  18.3347  | 61.7989  |         59.602         |
|           pytorch_unet            |  1   | 1.5508  |  4.4563   | 60.0716  |        58.4345         |
|       functorch_dp_cifar10        |  64  | 1.2194  |  2.4446   | 56.2904  |         56.472         |
|          resnext50_32x4d          |  8   | 3.2675  |  6.9852   | 52.7945  |        51.9382         |
|      timm_vision_transformer      |  32  | 3.3709  |  7.2321   |  51.005  |        50.5161         |
|               hf_T5               |  8   | 5.6941  |  12.7604  | 49.1196  |        48.5289         |
|           fastNLP_Bert            |  6   | 5.2302  |  11.2996  | 49.0024  |        48.1915         |
|          pytorch_stargan          |  16  |  1.223  |  3.2833   | 45.5421  |        45.1309         |
|          LearningToPaint          |  96  | 1.4029  |  2.9423   | 45.1795  |         43.714         |
|            hf_Reformer            |  4   | 4.2034  |  6.0534   | 44.3884  |        39.9608         |
|             resnet18              |  16  |  1.37   |  2.8956   | 43.1682  |        44.0766         |
|            Super_SloMo            |  6   |  2.76   |  9.8371   | 42.8724  |        41.2629         |
|             hf_Albert             |  8   |  2.527  |  8.2234   | 41.3268  |        38.8487         |
|              hf_GPT2              |  4   | 4.8067  |  9.7721   | 40.7122  |        40.8359         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2628  |  2.9725   | 37.2472  |        34.9036         |
|              hf_Bert              |  4   | 5.1135  |  10.5508  | 36.9524  |        37.7295         |
|          phlippe_resnet           | 128  |  1.365  |  2.8569   | 33.0322  |        32.5137         |
|           hf_DistilBert           |  8   | 2.3856  |  5.3059   | 30.9019  |        30.1744         |
|              demucs               |  4   | 1.4126  |   2.18    | 29.9179  |        29.9346         |
|           squeezenet1_1           |  32  | 1.0555  |  1.7669   | 23.3345  |        23.4247         |
|          pytorch_struct           | 200  | 0.7567  |  1.3311   | 19.2973  |        18.8854         |
|               vgg16               |  64  | 0.6317  |  1.1142   |  16.363  |        15.3628         |
|              alexnet              | 128  | 0.4963  |  0.7788   | 14.8842  |        14.7146         |
|                drq                |  1   | 0.6646  |  1.0216   |  9.7992  |         8.8698         |
|      nvidia_deeprecommender       | 256  | 0.4862  |  0.7648   |  8.7227  |         9.2114         |
|               dcgan               |  32  | 0.4304  |  0.7048   |  7.9288  |         7.7121         |
|               dlrm                | 1024 | 0.3731  |  0.7884   |  7.6309  |         7.2932         |
|         soft_actor_critic         | 256  | 0.4281  |  0.6098   |  7.2867  |         7.0127         |
|            tts_angular            |  64  | 0.4519  |  0.5085   |  5.9077  |         5.669          |
|           lennard_jones           | 1000 | 0.3984  |  0.6053   |  5.3242  |         6.2023         |
|   timm_vision_transformer_large   |  32  | 9.5134  |    nan    |   nan    |        123.5278        |
|               moco                |  32  | 28.1659 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2037         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9858 |  0.7651   |  1.0104  |         1.103          |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  0.9852  |         0.9957         |
|            timm_nfnet             | 128  | 0.9071 |  0.8749   |  0.9691  |         1.0708         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.963  |  0.8353   |  0.9425  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9319  |         1.0718         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8906  |         1.1284         |
|         timm_efficientnet         |  32  | 0.9861 |   0.767   |  0.8706  |         0.9412         |
|              yolov3               |  16  | 0.9879 |  0.8286   |   0.87   |         1.0372         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.869          |
|           timm_resnest            |  32  | 0.9887 |  0.8833   |  0.8624  |         0.9665         |
|        shufflenet_v2_x1_0         | 128  | 0.9549 |  0.8383   |  0.8618  |         0.9594         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9949 |  0.8508   |  0.8508  |         0.9502         |
|             resnet152             |  32  | 0.9948 |  0.8934   |  0.8499  |         0.9412         |
|        Background_Matting         |  4   | 1.0132 |  0.6487   |  0.8484  |         1.0403         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9479         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|        mobilenet_v3_large         |  32  | 0.977  |  0.8745   |  0.7843  |         0.8709         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9933 |  0.8634   |  0.7815  |         0.8866         |
|              demucs               |  4   | 0.9663 |  0.9659   |  0.7734  |         0.9662         |
|           squeezenet1_1           |  32  | 0.966  |  0.9291   |  0.773   |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.7535  |         0.9285         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7274  |         0.7358         |
|               vgg16               |  64  | 0.9919 |  0.7243   |  0.7227  |         0.9805         |
|            mnasnet1_0             |  32  | 0.9754 |  0.8972   |  0.7159  |         0.8067         |
|              alexnet              | 128  | 0.9455 |   0.793   |  0.7088  |         0.9385         |
|            densenet121            |  4   | 0.9963 |  0.9808   |  0.7085  |         0.803          |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6971  |         1.1068         |
|          resnext50_32x4d          |  8   | 0.9955 |  0.8457   |  0.6667  |         0.7703         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8796   |  0.5904  |         0.6004         |
|             resnet18              |  16  | 0.9796 |  0.7996   |  0.5423  |         0.6127         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|           hf_Longformer           |  2   | 0.8565 |  0.8296   |  0.417   |         0.8947         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|               moco                |  32  | 0.9912 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 213.2799 | 215.4528  | 125.4616 |        121.0736        |
|        Background_Matting         |  4   | 126.0392 | 924.1047  | 104.0838 |        104.2233        |
|            hf_T5_large            |  2   | 235.0781 | 276.5868  | 101.9435 |        118.6605        |
|               hf_T5               |  8   | 181.7601 | 211.0222  | 94.5935  |         92.235         |
|           hf_Longformer           |  2   | 137.1601 | 196.8453  | 78.9497  |        90.7763         |
|            hf_BigBird             |  2   | 230.0934 | 252.0461  | 77.6608  |        119.3207        |
|            timm_nfnet             | 128  | 120.1376 | 120.0791  | 77.1482  |         80.292         |
|            hf_Reformer            |  4   | 82.1612  |  84.0493  | 70.9625  |        75.8331         |
|            Super_SloMo            |  6   | 79.6878  | 445.9522  | 64.4741  |        64.4746         |
|              yolov3               |  16  | 68.9709  |  85.1907  | 57.6778  |        57.4296         |
|            timm_regnet            |  32  | 61.2464  |  73.0351  | 56.1798  |        57.8499         |
|               vgg16               |  64  | 66.2681  |  66.2876  |  53.467  |        52.8595         |
|             resnet152             |  32  | 65.1642  |  84.5551  |  53.278  |        63.6134         |
|           hf_Bert_large           |  4   | 83.1358  |  94.5777  | 52.3903  |        52.9751         |
|              demucs               |  4   | 53.8915  |  53.6843  | 51.7846  |        51.7837         |
|        speech_transformer         |  32  | 61.6346  |  76.9771  | 37.2588  |         35.346         |
| attention_is_all_you_need_pytorch | 256  | 55.4506  |  60.1011  | 37.1352  |        37.6234         |
|              hf_Bart              |  4   | 64.4715  |  77.0755  | 35.2344  |        41.1938         |
|           fastNLP_Bert            |  6   | 54.2604  |  63.5286  | 33.7778  |        34.7805         |
|           mobilenet_v2            |  96  | 47.2184  |  60.6003  | 31.0631  |        31.2961         |
|             hf_Albert             |  8   | 68.7915  |  72.2711  | 29.8286  |        30.3821         |
|           pytorch_unet            |  1   | 40.0582  | 195.5399  |  29.419  |        29.5573         |
|              hf_GPT2              |  4   | 49.2943  |  51.2867  | 27.7173  |        27.5717         |
|            timm_vovnet            |  32  | 28.9653  |  34.3962  | 26.3074  |        26.6661         |
|              hf_Bert              |  4   | 40.6898  |  47.4511  | 22.8776  |        25.3164         |
|         timm_efficientnet         |  32  |  34.368  |  51.266   | 22.3787  |        29.7885         |
|             resnet50              |  32  | 26.7275  |  34.1333  | 22.1767  |        24.9711         |
|           hf_DistilBert           |  8   | 32.1373  |  32.7278  | 22.0684  |        22.0909         |
|            densenet121            |  4   | 55.3729  |  76.1363  | 19.4911  |        52.2178         |
|        shufflenet_v2_x1_0         | 128  | 31.1021  |  40.9181  | 18.9225  |        25.7413         |
|      timm_vision_transformer      |  32  | 29.7434  |  34.3841  |  18.345  |        20.5672         |
|           BERT_pytorch            |  16  | 54.8315  |  70.4611  | 17.7468  |        27.2948         |
|           timm_resnest            |  32  | 24.3422  |  28.3699  | 15.4198  |        15.8957         |
|        mobilenet_v3_large         |  32  |  27.616  |  34.625   | 14.0364  |         22.275         |
|            mnasnet1_0             |  32  | 23.0055  |  31.5681  | 12.8575  |        20.6158         |
|          resnext50_32x4d          |  8   | 21.0987  |  27.9761  | 11.8511  |        20.9633         |
|      nvidia_deeprecommender       | 256  | 10.2459  |  10.2512  | 11.7241  |        10.0433         |
|          pytorch_stargan          |  16  | 15.0938  |  18.5558  | 11.6883  |        11.8321         |
|         phlippe_densenet          | 128  | 23.8242  |  30.1664  | 11.4253  |        23.2198         |
|              alexnet              | 128  |  9.863   |  9.8577   |  9.0287  |         8.6426         |
|          LearningToPaint          |  96  | 11.5376  |  14.7613  |  8.628   |        10.6107         |
|            tts_angular            |  64  |  6.6885  |  6.7505   |  6.477   |         6.412          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 17.5548  |  18.4204  |  5.8836  |         7.8664         |
|             resnet18              |  16  |  9.4991  |  12.2191  |  5.8244  |          9.72          |
|           squeezenet1_1           |  32  | 11.0939  |  11.578   |  5.5022  |         7.9766         |
|          phlippe_resnet           | 128  |  9.1802  |  11.6903  |  4.9331  |         9.0081         |
|          pytorch_struct           | 200  |  5.0656  |  5.9888   |  3.2137  |         4.266          |
|       functorch_dp_cifar10        |  64  | 10.6356  |  11.0382  |  2.8873  |         7.3431         |
|                drq                |  1   |  3.3877  |  4.8983   |  2.1644  |         3.1132         |
|               dlrm                | 1024 |  4.4599  |  4.9461   |  2.1429  |         3.573          |
|               dcgan               |  32  |  2.4163  |  3.0195   |  1.4604  |         2.5051         |
|         soft_actor_critic         | 256  |  1.7224  |  2.7496   |  1.3513  |         2.7895         |
|           lennard_jones           | 1000 |  1.8374  |  2.1432   |  1.1201  |         1.7142         |
|   timm_vision_transformer_large   |  32  | 464.7903 |    nan    |   nan    |        429.8698        |
|               moco                |  32  | 52.4256  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9919 |  0.9038   |  2.4257  |         2.4329         |
|          MobileBertForMaskedLM          | 64  | 0.9489 |   0.812   |  2.3068  |         1.0738         |
|      GPT2ForSequenceClassification      |  4  | 0.9771 |  0.9476   |  2.2556  |         2.2821         |
|       MT5ForConditionalGeneration       | 16  | 0.9896 |  0.8404   |  2.1177  |         1.9109         |
|       ElectraForQuestionAnswering       | 64  | 0.9876 |  0.9767   |  2.1166  |         2.1125         |
|     MobileBertForQuestionAnswering      | 128 | 0.9545 |  0.8037   |  2.0604  |         1.1294         |
|             XGLMForCausalLM             |  8  | 0.9515 |  0.7334   |  2.0059  |         1.1724         |
|            XLNetLMHeadModel             |  8  | 0.9925 |   0.967   |  1.8126  |         1.8178         |
|    LayoutLMForSequenceClassification    | 16  | 0.9847 |  0.9707   |  1.799   |         1.7866         |
|       RobertaForQuestionAnswering       | 16  | 0.9845 |  0.9698   |  1.7846  |         1.7665         |
|           ElectraForCausalLM            | 32  | 0.982  |  0.9338   |  1.7835  |         1.8168         |
|        BertForQuestionAnswering         | 16  | 0.9851 |  0.9701   |  1.7746  |         1.7605         |
|           RobertaForCausalLM            | 16  | 0.9869 |  0.9621   |  1.6787  |         1.6656         |
|               DistillGPT2               | 16  | 0.9881 |  0.9555   |  1.6568  |         1.7013         |
|       AlbertForQuestionAnswering        |  4  | 0.9999 |  0.8849   |  1.6552  |         1.652          |
|            AlbertForMaskedLM            |  4  | 0.9996 |  0.8842   |  1.642   |         1.6435         |
|                 T5Small                 |  4  | 0.9812 |  0.8486   |  1.6296  |         1.7339         |
|       T5ForConditionalGeneration        |  4  | 0.9818 |  0.8521   |  1.6279  |         1.7399         |
|            PLBartForCausalLM            |  8  | 0.986  |  0.9616   |  1.6159  |         1.6299         |
|    MegatronBertForQuestionAnswering     |  8  | 0.981  |  0.9611   |  1.603   |         1.6257         |
|             BertForMaskedLM             | 16  | 0.9867 |  0.9606   |  1.595   |         1.5947         |
|          AllenaiLongformerBase          |  4  | 0.8851 |  0.6314   |  1.5928  |         1.4968         |
|     PLBartForConditionalGeneration      |  4  | 0.9891 |  0.9397   |  1.5913  |         1.6256         |
|           LayoutLMForMaskedLM           | 16  | 0.9865 |  0.9615   |  1.5842  |         1.5832         |
|                CamemBert                | 16  | 0.987  |  0.9639   |  1.546   |         1.5333         |
|     M2M100ForConditionalGeneration      | 16  | 1.0259 |  0.8137   |  1.5384  |         1.4854         |
|            MBartForCausalLM             |  4  | 0.9843 |  0.9637   |  1.4908  |         1.5331         |
|             BartForCausalLM             |  4  | 0.9885 |  0.9605   |  1.4897  |         1.5374         |
|            YituTechConvBert             | 16  | 0.9862 |   0.951   |  1.4883  |         1.4902         |
|         MegatronBertForCausalLM         |  4  | 0.985  |  0.9205   |  1.4742  |         1.5052         |
|     DistilBertForQuestionAnswering      | 256 | 0.9939 |  0.9849   |  1.4539  |         1.4476         |
|         Speech2Text2ForCausalLM         | 256 | 0.9839 |  0.9287   |  1.4537  |         1.5139         |
|      BartForConditionalGeneration       |  2  | 0.9956 |   0.874   |  1.4526  |         1.4788         |
|      MBartForConditionalGeneration      |  2  | 0.9999 |  0.9498   |  1.4443  |         1.4428         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9992 |  0.9162   |  1.365   |         1.4286         |
|     PegasusForConditionalGeneration     | 32  | 0.9957 |  0.9279   |  1.3078  |         1.2987         |
|            TrOCRForCausalLM             | 32  | 0.9881 |  0.9616   |  1.242   |         1.2855         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9731 |  0.8904   |  1.2152  |         1.2019         |
|          DistilBertForMaskedLM          | 128 | 0.9928 |  0.9515   |  1.2102  |         1.2396         |
|           PegasusForCausalLM            | 32  | 0.9578 |  0.8918   |  1.1606  |         1.1452         |
|       DebertaForQuestionAnswering       |  8  | 0.8077 |  0.7057   |  1.0112  |         0.9108         |
|           DebertaForMaskedLM            |  4  | 0.7179 |  0.5794   |  0.9351  |         0.7893         |
|          DebertaV2ForMaskedLM           |  1  |  0.69  |  0.5229   |  0.8425  |         0.624          |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6931 |  0.5253   |  0.795   |         0.633          |
|          BlenderbotForCausalLM          |  4  | 0.9467 |  0.7413   |   0.0    |         1.0827         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          AllenaiLongformerBase          |  4  | 11.5304 |  31.6727  | 152.1659 |        117.2945        |
|          MobileBertForMaskedLM          | 64  | 17.3448 |  41.0896  | 143.8201 |        143.3702        |
|     MobileBertForQuestionAnswering      | 128 | 17.4873 |  42.7787  | 136.9777 |        142.0905        |
|          DebertaV2ForMaskedLM           |  1  | 15.4625 |  26.8594  | 134.7087 |        70.1598         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.3586 |  26.8634  | 133.5676 |        67.4951         |
|       MT5ForConditionalGeneration       | 16  | 8.2233  |  19.2449  | 132.4141 |        129.5675        |
|     M2M100ForConditionalGeneration      | 16  | 12.1223 |  25.8214  | 111.9143 |        106.3697        |
|            XLNetLMHeadModel             |  8  | 10.6212 |  27.6024  | 91.6197  |        91.7331         |
|       DebertaForQuestionAnswering       |  8  | 7.1753  |  13.3886  | 81.9674  |        51.9217         |
|           DebertaForMaskedLM            |  4  | 7.2817  |  13.5838  | 80.3868  |        49.8476         |
|             XGLMForCausalLM             |  8  | 9.4858  |  21.2615  | 78.9918  |        71.3529         |
|      MBartForConditionalGeneration      |  2  | 11.7048 |  26.514   | 78.7427  |        77.6243         |
|            YituTechConvBert             | 16  | 10.9363 |  19.8988  | 76.5504  |        74.9041         |
|     PegasusForConditionalGeneration     | 32  | 5.4976  |  19.6836  | 75.9532  |        73.6406         |
|      BartForConditionalGeneration       |  2  | 11.6408 |  27.8216  | 74.7139  |        73.2231         |
|    MegatronBertForQuestionAnswering     |  8  | 10.6556 |  21.7253  | 65.3827  |        64.7129         |
|           ElectraForCausalLM            | 32  | 7.5963  |  14.3609  | 65.1283  |        63.9858         |
|         MegatronBertForCausalLM         |  4  | 10.5673 |  22.1073  | 64.9943  |        64.8842         |
|     PLBartForConditionalGeneration      |  4  | 9.2688  |  17.194   | 62.5627  |        57.0971         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7241  |  17.2097  | 54.2763  |        53.9719         |
|       T5ForConditionalGeneration        |  4  | 5.6604  |  12.8119  | 50.0774  |        49.6283         |
|                 T5Small                 |  4  | 5.6812  |  12.811   | 50.0422  |        49.7448         |
|            MBartForCausalLM             |  4  | 6.6503  |  12.0885  | 47.9587  |         42.819         |
|           PegasusForCausalLM            | 32  | 6.0173  |  11.5258  | 47.8386  |        41.9892         |
|             BartForCausalLM             |  4  | 6.1857  |  11.9728  |  47.402  |        44.3576         |
|            TrOCRForCausalLM             | 32  | 6.5243  |  11.7395  | 46.2294  |        42.6583         |
|    LayoutLMForSequenceClassification    | 16  |  5.567  |  11.832   | 45.7574  |        45.3475         |
|       ElectraForQuestionAnswering       | 64  | 5.1754  |  11.5188  | 43.8875  |        43.5869         |
|             OPTForCausalLM              |  2  | 5.4397  |  11.0731  | 43.7014  |        42.9474         |
|           LayoutLMForMaskedLM           | 16  | 5.5563  |  12.0401  | 39.7174  |        38.4213         |
|             BertForMaskedLM             | 16  | 5.1971  |  11.4754  | 38.9339  |        39.6988         |
|        BertForQuestionAnswering         | 16  | 5.1155  |  11.387   | 38.3464  |        38.5832         |
|       BlenderbotSmallForCausalLM        | 64  |  4.535  |  8.2957   | 38.3455  |        35.6499         |
|                CamemBert                | 16  | 5.2781  |  11.5436  | 36.8799  |        38.1615         |
|            AlbertForMaskedLM            |  4  | 2.2609  |   8.209   | 36.6028  |        37.2128         |
|           RobertaForCausalLM            | 16  | 5.4568  |  10.9399  | 36.2812  |        36.0738         |
|     DistilBertForQuestionAnswering      | 256 | 2.4996  |  5.3822   |  35.876  |        34.2055         |
|       RobertaForQuestionAnswering       | 16  | 5.2564  |  10.7417  | 35.0962  |        34.6868         |
|      GPT2ForSequenceClassification      |  4  | 4.8874  |  9.9426   | 35.0497  |        33.5413         |
|         Speech2Text2ForCausalLM         | 256 |  3.326  |  6.1028   | 34.9758  |        34.2104         |
|            PLBartForCausalLM            |  8  | 3.6653  |  6.8171   | 34.0903  |        34.1444         |
|       AlbertForQuestionAnswering        |  4  |  2.229  |  8.1183   |  33.87   |        33.2928         |
|          DistilBertForMaskedLM          | 128 | 2.5002  |  5.3684   | 33.8452  |         33.762         |
|               DistillGPT2               | 16  | 2.5429  |  5.1024   | 27.9167  |        27.7878         |
|          BlenderbotForCausalLM          |  4  | 11.645  |  22.7163  |   nan    |        70.4431         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9246   |  1.062   |         1.1099         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|            YituTechConvBert             | 16  | 0.953  |  0.8749   |  0.9793  |         0.9905         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8787   |  0.9563  |         0.9847         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0019         |
|            PLBartForCausalLM            |  8  | 0.9237 |  0.8168   |  0.8907  |         0.9249         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8957   |  0.8901  |         1.0074         |
|           ElectraForCausalLM            | 32  | 0.9161 |  0.7864   |  0.889   |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|            TrOCRForCausalLM             | 32  | 0.918  |   0.829   |  0.8619  |         0.9075         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8913   |  0.8491  |         0.9507         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|             BartForCausalLM             |  4  | 0.9497 |  0.8911   |  0.8301  |         0.943          |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.8065  |         0.8318         |
|           PegasusForCausalLM            | 32  | 0.9238 |  0.8405   |  0.7952  |         0.9252         |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7545   |  0.7566  |         0.808          |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.7188  |         0.9535         |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |  0.6744  |         0.9287         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9764   |  0.487   |         0.9802         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |  0.4688  |         0.8742         |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0537   |  0.4601  |         1.1527         |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |   nan    |         0.9941         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.8965 | 301.1351  | 162.5609 |        162.2618        |
|       AlbertForQuestionAnswering        |  4  | 264.8193 | 298.2441  | 159.8517 |        160.1511        |
|            XLNetLMHeadModel             |  8  | 283.332  | 289.2332  | 155.3396 |        153.671         |
|      DebertaV2ForQuestionAnswering      |  2  | 151.8438 | 202.7029  | 135.6367 |        168.9343        |
|          DebertaV2ForMaskedLM           |  1  | 149.7336 | 197.6626  | 124.0962 |        167.9118        |
|          AllenaiLongformerBase          |  4  | 205.4046 | 286.3274  | 114.7427 |        121.6917        |
|            TrOCRForCausalLM             | 32  | 139.7567 | 142.5723  | 111.2928 |        107.0841        |
|     PegasusForConditionalGeneration     | 32  | 156.7153 | 152.0392  | 110.9484 |        116.1681        |
|      MBartForConditionalGeneration      |  2  | 149.4115 | 158.3522  | 95.5949  |        97.5109         |
|      BartForConditionalGeneration       |  2  | 138.6057 | 169.9033  | 94.6617  |        92.9127         |
|    MegatronBertForQuestionAnswering     |  8  | 144.7884 | 147.7018  | 88.6022  |        87.7255         |
|            YituTechConvBert             | 16  | 127.2904 |  131.941  | 84.5467  |        84.2136         |
| BlenderbotSmallForConditionalGeneration | 64  | 113.4694 | 135.3916  | 81.3811  |         80.181         |
|     MobileBertForQuestionAnswering      | 128 | 192.8495 | 256.5055  | 81.1771  |        180.8669        |
|             BartForCausalLM             |  4  | 115.136  | 119.1687  | 76.7571  |        73.8457         |
|                CamemBert                | 16  | 120.0519 | 123.2119  | 76.7365  |        77.3359         |
|            MBartForCausalLM             |  4  | 116.4407 | 117.8078  | 76.3692  |        74.9165         |
|       DebertaForQuestionAnswering       |  8  | 93.7177  | 107.1264  | 75.3146  |        83.5204         |
|     M2M100ForConditionalGeneration      | 16  | 119.4945 | 187.9072  | 73.7413  |        106.1427        |
|     PLBartForConditionalGeneration      |  4  | 117.8397 | 128.1525  | 73.4842  |        73.3306         |
|          MobileBertForMaskedLM          | 64  | 184.408  | 218.2138  | 72.9376  |        164.4617        |
|            PLBartForCausalLM            |  8  | 114.2785 | 117.6547  | 71.6697  |        70.0866         |
|     DistilBertForQuestionAnswering      | 256 | 103.8544 | 104.9046  | 71.5547  |        71.4328         |
|           LayoutLMForMaskedLM           | 16  | 114.3279 | 117.3627  | 71.4236  |        71.2644         |
|          DistilBertForMaskedLM          | 128 | 85.2514  |  88.9747  | 70.0389  |        68.9435         |
|             OPTForCausalLM              |  2  | 170.8159 | 180.2574  | 69.8096  |        68.4145         |
|             BertForMaskedLM             | 16  | 111.6897 | 114.9743  | 68.9404  |        69.8696         |
|           DebertaForMaskedLM            |  4  | 83.3381  | 120.2089  | 68.8396  |        86.1229         |
|           RobertaForCausalLM            | 16  | 116.7835 | 119.6261  | 68.7105  |        69.1271         |
|                 T5Small                 |  4  | 108.5604 |  122.664  | 64.3375  |        60.1138         |
|       T5ForConditionalGeneration        |  4  | 108.4078 | 122.4491  | 64.2625  |        60.3414         |
|               DistillGPT2               | 16  | 107.6841 | 110.7889  | 63.9172  |        62.5213         |
|           PegasusForCausalLM            | 32  | 73.2998  |  77.514   | 59.6955  |        65.6868         |
|         MegatronBertForCausalLM         |  4  | 88.7993  |  95.1092  | 59.3799  |        58.3585         |
|             XGLMForCausalLM             |  8  | 103.5167 | 126.2106  | 55.0465  |         91.495         |
|    LayoutLMForSequenceClassification    | 16  | 99.4509  | 100.9328  | 54.3639  |        54.7388         |
|       ElectraForQuestionAnswering       | 64  | 116.1703 | 117.7955  | 54.2397  |        55.3264         |
|        BertForQuestionAnswering         | 16  | 96.8143  |  98.2209  | 53.7597  |        54.1701         |
|       RobertaForQuestionAnswering       | 16  | 97.1539  |  98.6824  |  53.615  |        54.1795         |
|           ElectraForCausalLM            | 32  | 89.8906  |  94.5791  |  49.451  |        48.5736         |
|       BlenderbotSmallForCausalLM        | 64  | 61.5262  |  65.0681  | 47.8161  |        48.1562         |
|       MT5ForConditionalGeneration       | 16  | 94.6045  | 122.2081  | 44.1624  |        54.2264         |
|      GPT2ForSequenceClassification      |  4  | 93.6707  |  96.7083  | 40.6407  |        40.1758         |
|         Speech2Text2ForCausalLM         | 256 |  53.229  |  56.1833  | 36.6259  |        35.0265         |
|          BlenderbotForCausalLM          |  4  | 115.4877 | 158.2403  |   nan    |        100.4666        |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9983 |  0.9965   |  3.0201  |         2.9814         |
|      xcit_large_24_p8_224       |  5  | 0.9891 |  0.8689   |  1.9817  |         1.5748         |
|         coat_lite_mini          | 128 | 0.9973 |  0.9957   |  1.9445  |         1.9225         |
|        twins_pcpvt_base         | 64  | 0.9963 |  0.9155   |  1.9353  |         1.6772         |
|          gmlp_s16_224           | 128 | 0.9944 |  1.0832   |  1.8433  |         1.8319         |
|          ghostnet_100           | 128 | 0.9921 |  0.7475   |  1.834   |         1.598          |
|          gmixer_24_224          | 128 | 0.9947 |  0.8891   |  1.7622  |         1.7445         |
|           volo_d1_224           | 64  | 0.9944 |  0.9727   |  1.6895  |         1.6668         |
|            lcnet_050            | 128 | 0.9392 |  0.7348   |  1.6772  |         1.4502         |
|         crossvit_9_240          | 128 | 0.9904 |  0.7831   |  1.6424  |         1.6143         |
|  swin_base_patch4_window7_224   | 64  | 0.9908 |  0.9546   |  1.6165  |         1.6082         |
|           convit_base           | 64  | 0.9982 |  0.9977   |  1.6102  |         1.607          |
|          inception_v3           | 128 | 0.9964 |  0.8624   |  1.528   |         1.5198         |
|        adv_inception_v3         | 128 | 0.9961 |  0.8587   |  1.5239  |         1.5209         |
|       gluon_inception_v3        | 128 | 0.9967 |  0.8633   |  1.5233  |         1.522          |
|             dla102              | 128 | 0.9951 |  0.8139   |  1.5216  |         1.5235         |
|        sebotnet33ts_256         | 64  | 0.957  |  0.7644   |  1.5031  |         1.5353         |
|          convnext_base          | 64  | 0.9838 |  0.9852   |   1.49   |         1.4701         |
|            nfnet_l0             | 128 | 0.9891 |  0.8151   |  1.4868  |         1.435          |
|           dm_nfnet_f0           | 128 | 0.9873 |  0.9845   |  1.4732  |         1.4305         |
|       eca_botnext26ts_256       | 128 | 0.9723 |  0.7191   |  1.4427  |         1.4248         |
|            pit_b_224            | 64  | 0.9946 |  0.9927   |  1.4342  |         1.4284         |
|      mobilenetv3_large_100      | 128 | 0.9491 |  0.7594   |  1.4336  |          1.41          |
|           mnasnet_100           | 128 | 0.9463 |  0.7396   |  1.4304  |         1.4997         |
|           mobilevit_s           | 64  | 0.9622 |   0.726   |  1.4274  |         1.4435         |
|           resnest101e           | 64  | 0.9941 |  0.8636   |  1.4185  |         1.3498         |
|           regnety_002           | 128 | 0.9486 |  0.7234   |  1.4071  |         1.2152         |
|           selecsls42b           | 128 | 0.998  |  0.8112   |  1.4065  |         1.4087         |
|          botnet26t_256          | 128 | 0.9735 |  0.8509   |  1.4063  |         1.4225         |
|         mobilenetv2_100         | 128 | 0.9491 |  0.7358   |  1.3898  |         1.4468         |
|        res2net50_14w_8s         | 128 | 0.999  |  0.7878   |  1.3778  |         1.3569         |
|           res2next50            | 128 | 0.9985 |   0.823   |  1.3687  |         1.3628         |
|          jx_nest_base           | 32  | 0.9876 |  0.9847   |  1.3653  |         1.3584         |
|          mixer_b16_224          | 128 | 0.9972 |  1.0183   |  1.3642  |         1.3601         |
|       tf_efficientnet_b0        | 128 | 0.9604 |  0.6812   |  1.3551  |         1.3817         |
|        ese_vovnet19b_dw         | 128 | 0.9592 |  0.8317   |  1.3542  |         1.3698         |
|          spnasnet_100           | 128 | 0.9411 |  0.7374   |  1.3533  |         1.4176         |
|           fbnetc_100            | 128 | 0.9491 |  0.7366   |  1.3514  |         1.4006         |
|      beit_base_patch16_224      | 64  | 0.9964 |  0.9677   |  1.3513  |         1.3517         |
|          cait_m36_384           |  4  | 0.9949 |  0.9931   |  1.3508  |         1.3513         |
|            hrnet_w18            | 128 | 0.9922 |  0.6441   |  1.3505  |         1.3454         |
|         poolformer_m36          | 64  | 0.9873 |  0.9837   |  1.3305  |         1.3195         |
|            fbnetv3_b            | 128 | 0.9492 |  0.7665   |  1.3119  |         1.3208         |
|           rexnet_100            | 128 | 0.9519 |  0.7025   |  1.2935  |         1.3347         |
|          resmlp_12_224          | 128 | 0.9929 |  0.8889   |  1.2622  |         1.2574         |
| deit_base_distilled_patch16_224 | 64  | 0.9964 |  0.9936   |  1.2552  |         1.255          |
|      vit_base_patch16_224       | 64  | 0.9955 |  0.9932   |  1.235   |         1.2352         |
|            tinynet_a            | 128 | 0.9468 |  0.6777   |  1.2195  |         1.265          |
|          cspdarknet53           | 64  | 0.933  |  0.7846   |  1.2193  |         1.2616         |
|           tf_mixnet_l           | 128 | 0.9776 |  0.8263   |  1.1874  |         1.1904         |
|            mixnet_l             | 128 | 0.9767 |  0.8207   |  1.1759  |         1.1819         |
|         visformer_small         | 128 | 0.9962 |  0.9435   |  1.1735  |         1.1665         |
|        res2net101_26w_4s        | 64  | 0.9984 |  0.7957   |  1.1498  |         1.0951         |
|          pnasnet5large          | 16  | 0.9859 |  0.9142   |  1.1147  |         1.1264         |
|             dpn107              | 32  | 0.9322 |  0.8062   |  1.0932  |         1.1358         |
|            repvgg_a2            | 128 | 0.9359 |  0.7548   |  1.0826  |         1.1187         |
|        gluon_xception65         | 32  | 0.9924 |   0.841   |  1.0755  |         1.0785         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8418   |  1.0573  |         1.0252         |
|            gernet_l             | 128 | 0.9362 |  0.7912   |  1.0361  |         1.0651         |
|        convmixer_768_32         | 32  | 0.9987 |  0.9639   |  1.0023  |         1.003          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.7197  |  11.0204  | 279.3221 |        292.771         |
|            hrnet_w18            | 128 | 9.5726  |  36.2978  | 248.2146 |        244.4327        |
|          ghostnet_100           | 128 | 7.4768  |  15.1551  | 234.1809 |        234.8726        |
|            fbnetv3_b            | 128 | 8.4396  |  17.1022  | 170.7563 |        168.8537        |
|           resnest101e           | 64  | 11.1281 |  24.3616  | 166.3095 |        167.7274        |
|          pnasnet5large          | 16  |  8.128  |  25.9126  | 165.1175 |        162.246         |
|           mobilevit_s           | 64  |  5.376  |  11.3925  | 164.1183 |        156.2809        |
|      mobilenetv3_large_100      | 128 | 4.2548  |   8.447   | 163.0647 |        157.1416        |
|            mixnet_l             | 128 | 8.3843  |  16.4257  | 160.9964 |        158.9558        |
|       gluon_inception_v3        | 128 |  5.602  |  12.6087  | 160.4095 |        154.5762        |
|           tf_mixnet_l           | 128 | 8.9348  |  17.4133  | 160.0163 |        160.0615        |
|          inception_v3           | 128 | 5.6952  |  12.6275  | 158.9818 |        161.5416        |
|            tinynet_a            | 128 | 5.9604  |  12.3742  | 156.7662 |        162.5123        |
|        adv_inception_v3         | 128 | 5.6925  |  12.5013  | 156.1849 |        156.1085        |
|       tf_efficientnet_b0        | 128 | 5.0619  |  10.5823  | 154.9265 |        149.2906        |
|        res2net101_26w_4s        | 64  | 10.6618 |  24.6119  | 154.3769 |        153.3665        |
|        twins_pcpvt_base         | 64  | 10.5197 |  23.4424  | 149.0719 |        148.3567        |
|           fbnetc_100            | 128 |  5.107  |  9.4963   | 139.8576 |        136.4138        |
|          spnasnet_100           | 128 | 5.0494  |  9.3132   | 135.6297 |        133.2336        |
|         mobilenetv2_100         | 128 | 4.0392  |  7.9546   | 135.5622 |        125.0794        |
|      xcit_large_24_p8_224       |  5  | 12.7738 |  28.676   | 132.1645 |        132.8271        |
|        res2net50_14w_8s         | 128 | 9.0702  |  22.3265  | 122.8699 |        122.7576        |
|           mnasnet_100           | 128 |  4.085  |  7.6542   | 120.651  |        125.1403        |
|          cait_m36_384           |  4  | 13.7524 |  30.9073  | 114.2653 |        113.4063        |
|        sebotnet33ts_256         | 64  | 4.1701  |  8.9322   | 108.7764 |        108.6942        |
|  swin_base_patch4_window7_224   | 64  | 8.3472  |  19.0749  | 107.0103 |        105.3263        |
|           regnety_002           | 128 | 5.0595  |  9.1093   | 105.4383 |        104.5097        |
|         poolformer_m36          | 64  | 7.6867  |  13.9387  | 100.6629 |        100.6118        |
|            lcnet_050            | 128 | 2.5276  |  5.0348   | 100.4642 |         96.846         |
|          cspdarknet53           | 64  |  5.755  |  11.2354  |  98.977  |        95.1311         |
|             dpn107              | 32  | 9.7701  |  19.8378  | 98.6151  |        98.2194         |
|             dla102              | 128 | 6.2885  |  14.3319  | 95.4631  |        95.8089         |
|       eca_botnext26ts_256       | 128 | 3.1483  |  6.9173   |  95.357  |        97.1007         |
|        gluon_xception65         | 32  | 7.7456  |  16.9865  | 94.6969  |         93.65          |
|           res2next50            | 128 |  5.061  |   12.15   | 90.5987  |        86.1513         |
|           selecsls42b           | 128 | 2.4844  |  5.5019   | 89.7057  |        93.2311         |
|          botnet26t_256          | 128 | 2.9141  |  6.0593   |  89.025  |        88.6668         |
|         coat_lite_mini          | 128 | 3.3281  |  7.9194   | 88.0729  |        87.4267         |
|         crossvit_9_240          | 128 | 5.8079  |  13.5521  |  84.784  |         84.018         |
|          jx_nest_base           | 32  | 6.8547  |  14.5389  | 83.6536  |        83.2951         |
|            gernet_l             | 128 | 5.0082  |  9.0379   | 81.4441  |        80.1114         |
|            nfnet_l0             | 128 | 5.3286  |  10.8859  | 77.9624  |        77.9876         |
|        ese_vovnet19b_dw         | 128 | 2.4879  |  4.6444   | 74.7532  |        75.8335         |
|           volo_d1_224           | 64  | 5.1489  |  12.1219  | 73.2533  |        74.5311         |
|           dm_nfnet_f0           | 128 | 5.9262  |  11.5576  | 70.6105  |        68.1892         |
|        tnt_s_patch16_224        | 128 | 6.5097  |  16.0384  | 68.7354  |        68.2883         |
|         visformer_small         | 128 | 2.5922  |  6.0896   | 66.5226  |         67.775         |
|     swsl_resnext101_32x16d      | 32  | 6.2584  |  13.6002  | 62.9583  |        62.2921         |
|          gmlp_s16_224           | 128 | 5.4929  |  12.0559  | 59.7892  |        59.0778         |
|            repvgg_a2            | 128 | 4.8745  |  8.7339   | 59.2296  |        59.7563         |
|          convnext_base          | 64  | 6.7806  |  12.7043  | 57.2065  |        56.9324         |
|          gmixer_24_224          | 128 | 5.7733  |  12.9349  | 50.5998  |        50.9249         |
|           convit_base           | 64  | 3.6474  |  9.0976   |  47.667  |        46.3382         |
|            pit_b_224            | 64  | 3.6794  |   8.544   | 44.6087  |        45.2436         |
| deit_base_distilled_patch16_224 | 64  | 3.1054  |  7.4881   | 41.0717  |         39.171         |
|      vit_base_patch16_224       | 64  |  3.088  |  7.1137   | 39.2063  |        38.5583         |
|          resmlp_12_224          | 128 | 2.8298  |  5.4582   | 38.9257  |        39.5638         |
|        convmixer_768_32         | 32  | 1.6915  |  6.9675   | 37.2955  |        35.1101         |
|      beit_base_patch16_224      | 64  | 3.8513  |  8.6832   | 35.0943  |        34.2315         |
|          mixer_b16_224          | 128 | 2.6983  |  5.9814   | 32.2008  |        32.0196         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9635 |  0.9155   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 301.1524 | 311.7388  | 299.8444 |        299.9403        |
|            hrnet_w18            | 128 | 282.4812 | 435.4727  | 206.989  |        208.792         |
|          pnasnet5large          | 16  | 198.6411 | 214.8612  | 176.0408 |        174.6785        |
|           tf_mixnet_l           | 128 | 194.392  | 229.4647  | 159.9279 |        159.5648        |
|            mixnet_l             | 128 | 185.6241 | 221.5935  | 153.9939 |        153.3498        |
|          cait_m36_384           |  4  | 168.7799 | 168.4652  | 124.1198 |        126.992         |
|           resnest101e           | 64  | 165.3591 | 190.8197  | 115.9164 |        121.9837        |
|             dla102              | 128 | 173.0761 | 211.5047  | 113.2236 |        113.0086        |
|     swsl_resnext101_32x16d      | 32  | 119.1186 | 140.9038  | 112.1762 |        115.6093        |
|         poolformer_m36          | 64  | 147.2361 | 147.5164  | 109.226  |        110.2105        |
|        tnt_s_patch16_224        | 128 | 325.0616 | 324.8795  | 107.4945 |        108.8388        |
|        adv_inception_v3         | 128 | 161.6054 | 187.0314  | 105.6211 |        105.4758        |
|       gluon_inception_v3        | 128 | 160.9103 | 186.1344  | 105.2938 |        105.6035        |
|          inception_v3           | 128 | 160.8648 | 186.2313  | 105.0343 |        105.6265        |
|        res2net50_14w_8s         | 128 | 141.4328 | 178.9692  | 102.6901 |        103.8773        |
|           convit_base           | 64  | 163.7265 | 163.6779  | 101.4412 |        101.5451        |
|             dpn107              | 32  | 113.9962 | 131.8177  | 97.5131  |        93.7832         |
|        gluon_xception65         | 32  |  99.986  | 118.0291  | 92.4185  |        92.1261         |
|           res2next50            | 128 | 126.305  | 153.3302  | 92.3012  |        92.5003         |
|  swin_base_patch4_window7_224   | 64  | 147.8744 | 153.3624  | 90.6213  |        91.1871         |
|           dm_nfnet_f0           | 128 | 128.304  |  128.757  | 85.9887  |         88.85          |
|          mixer_b16_224          | 128 | 116.9134 | 114.2591  | 85.3178  |        86.0031         |
|        res2net101_26w_4s        | 64  | 100.7655 | 125.6151  | 85.2289  |        90.5193         |
|            fbnetv3_b            | 128 | 115.4667 | 143.2407  | 83.7127  |        83.1162         |
|          convnext_base          | 64  | 124.7568 | 124.3729  | 82.5273  |        83.3594         |
|            pit_b_224            | 64  | 119.0754 | 119.4263  | 82.5201  |        82.8317         |
|         visformer_small         | 128 | 91.4328  |  96.6931  | 77.7141  |        78.0853         |
|            nfnet_l0             | 128 | 113.1328 |  137.445  | 75.4544  |        78.0489         |
|      beit_base_patch16_224      | 64  | 101.8969 | 104.7769  | 75.1779  |        74.9155         |
|          gmlp_s16_224           | 128 |  138.03  | 126.4677  | 74.5011  |        74.7881         |
|       eca_botnext26ts_256       | 128 | 109.2464 | 147.5403  | 73.5642  |        74.4221         |
|          jx_nest_base           | 32  | 102.0602 | 101.6741  | 73.5213  |        73.8255         |
|          cspdarknet53           | 64  | 95.2332  | 113.2705  | 72.9577  |         70.295         |
|           volo_d1_224           | 64  | 121.2618 | 124.0598  | 71.5478  |        72.4968         |
|          botnet26t_256          | 128 | 102.0416 | 116.8292  | 70.7593  |        69.8814         |
|            gernet_l             | 128 | 77.7891  |  92.1867  | 70.3085  |        68.4601         |
|      vit_base_patch16_224       | 64  | 87.0019  |  87.1929  | 70.2929  |         70.226         |
| deit_base_distilled_patch16_224 | 64  | 85.1573  |  85.4584  | 67.5373  |        67.4842         |
|            repvgg_a2            | 128 |  77.862  |  96.5444  | 67.4313  |        65.1824         |
|          gmixer_24_224          | 128 | 118.5009 | 132.5967  | 66.9528  |        67.6906         |
|      xcit_large_24_p8_224       |  5  | 124.4223 | 149.6763  | 62.6818  |        78.4005         |
|       tf_efficientnet_b0        | 128 | 85.1557  | 120.3355  | 60.3567  |        59.2295         |
|        twins_pcpvt_base         | 64  | 117.7516 | 144.0602  | 60.3213  |        68.9306         |
|           rexnet_100            | 128 | 80.5569  | 108.8355  | 59.1181  |        57.2903         |
|           fbnetc_100            | 128 | 82.9795  | 106.9377  | 58.3716  |        56.4046         |
|         coat_lite_mini          | 128 | 113.6434 | 113.7829  | 58.2618  |        58.8991         |
|            tinynet_a            | 128 | 74.0669  | 103.2141  | 57.2186  |        55.3662         |
|           mobilevit_s           | 64  | 84.8618  | 112.6429  | 57.1289  |        56.5637         |
|        sebotnet33ts_256         | 64  | 80.6858  | 101.0508  | 51.3646  |         50.327         |
|         crossvit_9_240          | 128 | 82.6597  | 104.5137  | 49.9689  |        50.7421         |
|          ghostnet_100           | 128 | 90.9536  | 120.8688  |  49.133  |        56.4322         |
|          spnasnet_100           | 128 | 70.5287  |  90.0914  | 49.0711  |        46.9621         |
|        ese_vovnet19b_dw         | 128 | 64.7139  |  74.7355  | 45.8011  |         45.311         |
|         mobilenetv2_100         | 128 | 65.8315  |  84.8965  | 44.8482  |        43.0309         |
|           selecsls42b           | 128 | 60.2266  |  73.9902  | 42.7946  |        42.7719         |
|           mnasnet_100           | 128 | 64.7945  |  82.7968  | 42.6907  |         40.675         |
|          resmlp_12_224          | 128 | 53.5809  |  59.9616  | 42.1981  |        42.3582         |
|      mobilenetv3_large_100      | 128 | 61.4011  |  76.7398  | 40.6938  |        41.4179         |
|           regnety_002           | 128 | 40.1887  |  57.4975  | 26.7037  |         31.603         |
|            lcnet_050            | 128 |  31.912  |  40.7199  | 17.8179  |        20.6247         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

Build Summary

see more

Run name

day_088_29_03_23_performance_amp_652

Commit hashes

pytorch commit: 7fc100a
pytorch commit date: 2023-03-30 02:12:52+00:00
torchbench commit: 0faa0142100f5fb7f3b86255515a6dee6b3d7cd5
torchbench commit date: 2023-03-29 17:33:55-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git7fc100a

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 82%, 49/60 | 84%, 38/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 88%, 53/60 | 98%, 44/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.00x    |    1.40x    |    1.00x    |
| inductor_no_cudagraphs |   1.27x    |    1.48x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.81    |    7.76     |    5.97     |
|       aot_eager        |    9.48    |    16.30    |    13.38    |
|        inductor        |   61.02    |    76.84    |   101.92    |
| inductor_no_cudagraphs |   64.02    |    59.45    |   109.47    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.94x    |    0.99x    |    1.02x    |
| inductor_no_cudagraphs |   0.94x    |    1.03x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Previous report name: /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 87%, 52/60  | 82%, 49/60  |
|        inductor        | huggingface | 93%, 42/45  | 84%, 38/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 88%, 53/60  | 88%, 53/60  |
| inductor_no_cudagraphs | huggingface | 98%, 44/45  | 98%, 44/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.58x    |   1.00x   |
|        inductor        | huggingface |   1.57x    |   1.40x   |
|        inductor        | timm_models |   1.41x    |   1.00x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.48x    |   1.48x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+----------------------------+-----------------+------------------------+
|    suite    |            name            |    inductor     | inductor_no_cudagraphs |
+-------------+----------------------------+-----------------+------------------------+
| torchbench  |            moco            |   fail_to_run   |      fail_to_run       |
| torchbench  |     Background_Matting     | eager_variation |    eager_variation     |
| torchbench  |      vision_maskrcnn       | eager_variation |    eager_variation     |
| torchbench  |         tacotron2          |     0.0000      |         0.0000         |
| torchbench  |            gat             |     0.0000      |         0.0000         |
| torchbench  |            gcn             |     0.0000      |         0.0000         |
| torchbench  |           llama            |     0.0000      |         0.0000         |
| torchbench  |            sage            |     0.0000      |         0.0000         |
| torchbench  |       torchrec_dlrm        |     0.0000      |         0.0000         |
| huggingface | AlbertForQuestionAnswering |  fail_accuracy  |     fail_accuracy      |
+-------------+----------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-----------------------------------+----------+------------------------+
|    suite    |               name                | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------+----------+------------------------+
| torchbench  |               hf_T5               |  0.1782  |         1.9468         |
| torchbench  |             hf_Albert             |  0.1716  |         2.2521         |
| torchbench  |               vgg16               |  0.1509  |         1.2559         |
| torchbench  |        Background_Matting         |  0.125   |         1.2052         |
| torchbench  |            timm_nfnet             |  0.1019  |         1.4719         |
| torchbench  |           hf_GPT2_large           |  0.0971  |         1.7318         |
| torchbench  |           hf_Bert_large           |  0.0968  |         1.545          |
| torchbench  |              hf_Bert              |  0.0853  |         1.5697         |
| torchbench  |           pytorch_unet            |  0.0713  |         1.3518         |
| torchbench  |            hf_T5_large            |  0.0701  |         1.8776         |
| torchbench  |           BERT_pytorch            |  0.0642  |         2.1226         |
| torchbench  |              yolov3               |  0.057   |         1.1963         |
| torchbench  |           mobilenet_v2            |  0.0553  |         1.4048         |
| torchbench  | attention_is_all_you_need_pytorch |  0.055   |         1.4406         |
| torchbench  |              hf_GPT2              |  0.0543  |         1.7699         |
| torchbench  |           hf_DistilBert           |  0.0474  |         1.4553         |
| torchbench  |            timm_regnet            |  0.0469  |         0.9653         |
| torchbench  |              demucs               |  0.0409  |         1.037          |
| torchbench  |      timm_vision_transformer      |  0.0363  |         1.3786         |
| torchbench  |        shufflenet_v2_x1_0         |  0.0356  |         1.1985         |
| torchbench  |           timm_resnest            |  0.0352  |         1.5021         |
| torchbench  |            densenet121            |  0.0344  |         1.0521         |
| torchbench  |             resnet152             |  0.0342  |         1.0303         |
| torchbench  |             resnet50              |  0.0332  |         1.0749         |
| torchbench  |          pytorch_stargan          |  0.0317  |         1.2303         |
| torchbench  |            timm_vovnet            |  0.0315  |         0.9222         |
| torchbench  |         timm_efficientnet         |  0.0311  |         1.0899         |
| torchbench  |         phlippe_densenet          |  0.0306  |         1.0207         |
| torchbench  |        mobilenet_v3_large         |  0.0305  |         1.1944         |
| torchbench  |            mnasnet1_0             |  0.0285  |         1.0384         |
| torchbench  |          resnext50_32x4d          |  0.0275  |         0.993          |
| torchbench  |      nvidia_deeprecommender       |  0.0268  |         1.0178         |
| torchbench  |              alexnet              |  0.0241  |         1.1363         |
| torchbench  |           squeezenet1_1           |  0.0237  |         1.326          |
| torchbench  |          pytorch_struct           |  0.0236  |         1.104          |
| torchbench  |   pytorch_CycleGAN_and_pix2pix    |  0.0236  |         1.7792         |
| torchbench  |          phlippe_resnet           |  0.0212  |         1.001          |
| torchbench  |       functorch_dp_cifar10        |  0.0212  |         1.374          |
| torchbench  |            tts_angular            |  0.0212  |         0.9552         |
| torchbench  |             resnet18              |  0.0198  |         0.9759         |
| torchbench  |        speech_transformer         |  0.0187  |          1.6           |
| torchbench  |           fastNLP_Bert            |  0.0171  |         1.5158         |
| torchbench  |          LearningToPaint          |  0.0169  |         1.0611         |
| torchbench  |              hf_Bart              |  0.0081  |         1.427          |
| torchbench  |               dcgan               |  0.0079  |         0.8273         |
| torchbench  |            hf_Reformer            |  0.0077  |         1.0585         |
| torchbench  |           lennard_jones           |  0.0069  |         0.8859         |
| torchbench  |                drq                |  0.004   |         1.0688         |
| torchbench  |         soft_actor_critic         |  0.0033  |         0.8243         |
| torchbench  |   timm_vision_transformer_large   |   0.0    |         1.0806         |
| torchbench  |               sage                |   0.0    |          0.0           |
| torchbench  |                gat                |   0.0    |          0.0           |
| torchbench  |             tacotron2             |   0.0    |          0.0           |
| torchbench  |                gcn                |   0.0    |          0.0           |
| torchbench  |            hf_BigBird             |   0.0    |         1.6246         |
| torchbench  |               moco                |   0.0    |          0.0           |
| torchbench  |               dlrm                |   0.0    |         0.9089         |
| torchbench  |           hf_Longformer           |   0.0    |         1.2684         |
| torchbench  |           torchrec_dlrm           |   0.0    |          0.0           |
| huggingface |         YituTechConvBert          |  0.0275  |         1.4722         |
| huggingface |        ElectraForCausalLM         |  0.0262  |         1.8147         |
| huggingface |         PLBartForCausalLM         |  0.0238  |         1.6708         |
| huggingface |  PegasusForConditionalGeneration  |  0.0215  |         1.2992         |
| huggingface |          OPTForCausalLM           |  0.017   |         2.4796         |
| huggingface |         TrOCRForCausalLM          |  0.0133  |         1.2866         |
| huggingface |         MBartForCausalLM          |  0.0108  |         1.535          |
| huggingface |          BartForCausalLM          |  0.0107  |         1.5417         |
| huggingface |      Speech2Text2ForCausalLM      |  0.0095  |         1.5251         |
| huggingface |    BlenderbotSmallForCausalLM     |  0.0094  |         1.2051         |
| huggingface |  M2M100ForConditionalGeneration   |  0.0072  |         1.3753         |
| huggingface |        PegasusForCausalLM         |  0.0063  |         1.1519         |
| huggingface |          XGLMForCausalLM          |  0.0044  |         1.2264         |
| huggingface |       BlenderbotForCausalLM       |  0.0042  |         1.1045         |
| huggingface |       AllenaiLongformerBase       |   0.0    |         1.5002         |
| huggingface |  PLBartForConditionalGeneration   |   0.0    |         1.6282         |
| huggingface |    DebertaForQuestionAnswering    |   0.0    |         0.9179         |
| huggingface |        DebertaForMaskedLM         |   0.0    |         0.777          |
| huggingface |   DebertaV2ForQuestionAnswering   |   0.0    |         0.6372         |
| huggingface |       DebertaV2ForMaskedLM        |   0.0    |         0.6246         |
| timm_models |           mixer_b16_224           |  0.2712  |         1.3626         |
| timm_models |         convmixer_768_32          |  0.2497  |         1.0023         |
| timm_models |             pit_b_224             |  0.2306  |         1.4277         |
| timm_models |         tnt_s_patch16_224         |  0.2152  |         2.9814         |
| timm_models |           gmlp_s16_224            |  0.212   |         1.828          |
| timm_models |           gmixer_24_224           |  0.1889  |         1.7498         |
| timm_models |            convit_base            |  0.169   |         1.6085         |
| timm_models |           resmlp_12_224           |  0.1366  |         1.2589         |
| timm_models |            tf_mixnet_l            |  0.1241  |         1.1915         |
| timm_models |        eca_botnext26ts_256        |  0.1225  |         1.4234         |
| timm_models |             mixnet_l              |  0.1212  |         1.1823         |
| timm_models |           botnet26t_256           |  0.1194  |         1.4204         |
| timm_models |              dla102               |  0.1139  |          1.52          |
| timm_models |          coat_lite_mini           |  0.113   |         1.9251         |
| timm_models |       beit_base_patch16_224       |  0.1103  |         1.3511         |
| timm_models |          visformer_small          |  0.1084  |         1.1662         |
| timm_models |            dm_nfnet_f0            |  0.108   |         1.4276         |
| timm_models |           inception_v3            |  0.1057  |         1.5186         |
| timm_models |             nfnet_l0              |  0.1029  |         1.4387         |
| timm_models |       vit_base_patch16_224        |  0.1014  |         1.2357         |
| timm_models |  deit_base_distilled_patch16_224  |  0.0994  |         1.2556         |
| timm_models |            res2next50             |  0.0949  |         1.3631         |
| timm_models |            volo_d1_224            |  0.0948  |         1.6673         |
| timm_models |        gluon_inception_v3         |  0.0919  |         1.5168         |
| timm_models |       xcit_large_24_p8_224        |  0.091   |         1.5812         |
| timm_models |           convnext_base           |  0.0902  |         1.4711         |
| timm_models |   swin_base_patch4_window7_224    |  0.087   |         1.606          |
| timm_models |      swsl_resnext101_32x16d       |  0.0845  |         1.0251         |
| timm_models |         adv_inception_v3          |  0.0807  |         1.5189         |
| timm_models |           cspdarknet53            |  0.0773  |         1.2624         |
| timm_models |             repvgg_a2             |  0.0762  |         1.1189         |
| timm_models |             gernet_l              |  0.076   |         1.067          |
| timm_models |          poolformer_m36           |  0.0759  |         1.3201         |
| timm_models |            resnest101e            |  0.073   |         1.3512         |
| timm_models |            selecsls42b            |  0.0729  |         1.4108         |
| timm_models |        tf_efficientnet_b0         |  0.0722  |         1.3849         |
| timm_models |             hrnet_w18             |  0.0709  |         1.3482         |
| timm_models |           pnasnet5large           |  0.0699  |         1.1282         |
| timm_models |           jx_nest_base            |  0.0673  |         1.3587         |
| timm_models |          mobilenetv2_100          |  0.0661  |         1.4437         |
| timm_models |         res2net50_14w_8s          |  0.0655  |         1.354          |
| timm_models |          crossvit_9_240           |  0.0654  |         1.6151         |
| timm_models |           cait_m36_384            |  0.0648  |         1.3471         |
| timm_models |           ghostnet_100            |  0.0635  |         1.5898         |
| timm_models |             fbnetv3_b             |  0.063   |         1.3128         |
| timm_models |              dpn107               |  0.0622  |         1.1318         |
| timm_models |           spnasnet_100            |  0.0614  |         1.4168         |
| timm_models |            rexnet_100             |  0.0611  |         1.3319         |
| timm_models |         gluon_xception65          |  0.0592  |         1.0782         |
| timm_models |         twins_pcpvt_base          |  0.0574  |         1.6608         |
| timm_models |       mobilenetv3_large_100       |  0.057   |         1.4225         |
| timm_models |             tinynet_a             |  0.0546  |         1.2622         |
| timm_models |         res2net101_26w_4s         |  0.0416  |         1.0892         |
| timm_models |         ese_vovnet19b_dw          |  0.0407  |         1.3736         |
| timm_models |            regnety_002            |  0.0397  |         1.2309         |
| timm_models |             lcnet_050             |  0.0397  |         1.4654         |
| timm_models |         sebotnet33ts_256          |  0.0392  |         1.5347         |
| timm_models |            mobilevit_s            |  0.0343  |         1.4438         |
| timm_models |            mnasnet_100            |  0.0328  |         1.4958         |
| timm_models |            fbnetc_100             |  0.0282  |         1.4049         |
+-------------+-----------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 185.444  |        172.6575        |
| torchbench  |        phlippe_densenet        | 131.4983 |        166.721         |
| torchbench  |          densenet121           | 127.3627 |        136.5742        |
| torchbench  |       timm_efficientnet        | 123.0083 |        144.7696        |
| torchbench  |       mobilenet_v3_large       | 115.9145 |        138.3228        |
| torchbench  |             yolov3             | 106.1824 |        120.2883        |
| torchbench  |          mobilenet_v2          | 105.6331 |        123.9663        |
| torchbench  |           hf_BigBird           |   nan    |        125.7931        |
| torchbench  | timm_vision_transformer_large  |   nan    |        125.1634        |
| huggingface | M2M100ForConditionalGeneration | 185.8243 |        106.222         |
| huggingface |        XGLMForCausalLM         | 183.5628 |        71.2363         |
| huggingface |     BlenderbotForCausalLM      | 179.5094 |        70.6044         |
| huggingface |     MobileBertForMaskedLM      | 149.183  |        143.9504        |
| huggingface | MobileBertForQuestionAnswering | 142.1606 |        137.1101        |
| huggingface |  MT5ForConditionalGeneration   | 128.5388 |        132.3612        |
| timm_models |           hrnet_w18            | 241.9436 |        240.276         |
| timm_models |           rexnet_100           | 230.6732 |        295.3849        |
| timm_models |          ghostnet_100          | 195.6391 |        240.9628        |
| timm_models |         pnasnet5large          | 164.4357 |        160.6157        |
| timm_models |          resnest101e           | 157.2108 |        167.0321        |
| timm_models |           fbnetv3_b            | 151.9685 |        170.9544        |
| timm_models |          mobilevit_s           | 148.4481 |        160.2287        |
| timm_models |       res2net101_26w_4s        | 146.2659 |        149.7906        |
| timm_models |        twins_pcpvt_base        | 144.9804 |        146.6821        |
| timm_models |            mixnet_l            | 140.2277 |        163.528         |
| timm_models |        adv_inception_v3        | 139.7505 |        155.502         |
| timm_models |       gluon_inception_v3       | 139.1661 |        162.2305        |
| timm_models |           tinynet_a            | 139.1365 |        161.3443        |
| timm_models |          inception_v3          | 137.8222 |        155.6003        |
| timm_models |          tf_mixnet_l           | 137.3263 |        162.6732        |
| timm_models |      xcit_large_24_p8_224      | 136.0792 |        131.2448        |
| timm_models |     mobilenetv3_large_100      | 135.2729 |        157.0403        |
| timm_models |       tf_efficientnet_b0       | 131.8955 |        157.4531        |
| timm_models |           fbnetc_100           | 124.2615 |        136.6052        |
| timm_models |        res2net50_14w_8s        | 122.3688 |        124.6806        |
| timm_models |          cait_m36_384          | 120.8796 |        112.9109        |
| timm_models |          spnasnet_100          | 112.4383 |        136.3509        |
| timm_models |        mobilenetv2_100         | 107.5509 |        131.5354        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |         nvidia_deeprecommender          |  0.8951  |         0.8931         |
| torchbench  |             pytorch_stargan             |  0.8934  |         0.8893         |
| torchbench  |                resnet50                 |  0.8898  |         0.8844         |
| torchbench  |               timm_vovnet               |  0.889   |         0.8869         |
| torchbench  |         timm_vision_transformer         |  0.8873  |         0.8835         |
| torchbench  |            phlippe_densenet             |  0.8834  |         0.8659         |
| torchbench  |           speech_transformer            |  0.8694  |         0.869          |
| torchbench  |               densenet121               |  0.8268  |         0.8034         |
| torchbench  |               mnasnet1_0                |  0.813   |         0.779          |
| torchbench  |               hf_Reformer               |  0.8064  |         0.8022         |
| torchbench  |           mobilenet_v3_large            |  0.7825  |         0.8709         |
| torchbench  |             resnext50_32x4d             |  0.7779  |         0.7707         |
| torchbench  |             LearningToPaint             |  0.7552  |         0.7463         |
| torchbench  |             pytorch_struct              |  0.7424  |         0.7358         |
| torchbench  |                resnet18                 |  0.6222  |         0.6127         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.6035  |         0.6004         |
| torchbench  |          functorch_dp_cifar10           |  0.451   |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3554  |         0.3395         |
| torchbench  |              hf_Longformer              |   nan    |         0.8951         |
| huggingface |           ElectraForCausalLM            |  0.8953  |         0.8941         |
| huggingface |          DistilBertForMaskedLM          |  0.8872  |         0.9624         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8749  |         0.9803         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8333  |         0.8318         |
| huggingface |          MobileBertForMaskedLM          |  0.8112  |         1.016          |
| huggingface |         Speech2Text2ForCausalLM         |  0.8097  |         0.808          |
| huggingface |     MobileBertForQuestionAnswering      |  0.6659  |         0.8392         |
| huggingface |          AllenaiLongformerBase          |   nan    |         0.8742         |
| timm_models |               regnety_002               |  0.901   |         0.8966         |
| timm_models |                lcnet_050                |  0.8898  |         0.884          |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/geomean_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/passrate_over_time.png :

bench_logs/memory_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Performance speedup regressions

+------------------------+-----------------------------------+-------------+------------+
|        compiler        |               name                | prev_status | cur_status |
+------------------------+-----------------------------------+-------------+------------+
|        inductor        |               hf_T5               |   1.8968    |   0.1782   |
|        inductor        |             hf_Albert             |   2.3262    |   0.1716   |
|        inductor        |               vgg16               |   1.2402    |   0.1509   |
|        inductor        |        Background_Matting         |   1.2113    |   0.125    |
|        inductor        |            timm_nfnet             |   1.5333    |   0.1019   |
|        inductor        |           hf_GPT2_large           |   1.6705    |   0.0971   |
|        inductor        |           hf_Bert_large           |   1.5913    |   0.0968   |
|        inductor        |              hf_Bert              |   1.7947    |   0.0853   |
|        inductor        |           pytorch_unet            |   1.3555    |   0.0713   |
|        inductor        |            hf_T5_large            |   2.2138    |   0.0701   |
|        inductor        |           BERT_pytorch            |   3.0437    |   0.0642   |
|        inductor        |              yolov3               |    1.195    |   0.057    |
|        inductor        |           mobilenet_v2            |    1.518    |   0.0553   |
|        inductor        | attention_is_all_you_need_pytorch |   1.4725    |   0.055    |
|        inductor        |              hf_GPT2              |   1.7533    |   0.0543   |
|        inductor        |           hf_DistilBert           |   1.4208    |   0.0474   |
|        inductor        |            timm_regnet            |   0.9984    |   0.0469   |
|        inductor        |              demucs               |    1.035    |   0.0409   |
|        inductor        |      timm_vision_transformer      |   1.5392    |   0.0363   |
|        inductor        |        shufflenet_v2_x1_0         |   1.6261    |   0.0356   |
|        inductor        |           timm_resnest            |   1.5626    |   0.0352   |
|        inductor        |            densenet121            |   2.7963    |   0.0344   |
|        inductor        |             resnet152             |   1.2112    |   0.0342   |
|        inductor        |             resnet50              |   1.1826    |   0.0332   |
|        inductor        |          pytorch_stargan          |   1.2819    |   0.0317   |
|        inductor        |         timm_efficientnet         |   1.4422    |   0.0311   |
|        inductor        |         phlippe_densenet          |   2.0516    |   0.0306   |
|        inductor        |        mobilenet_v3_large         |   2.0697    |   0.0305   |
|        inductor        |            mnasnet1_0             |   1.7075    |   0.0285   |
|        inductor        |          resnext50_32x4d          |   1.7188    |   0.0275   |
|        inductor        |              alexnet              |   1.0884    |   0.0241   |
|        inductor        |           squeezenet1_1           |    1.988    |   0.0237   |
|        inductor        |   pytorch_CycleGAN_and_pix2pix    |   2.5856    |   0.0236   |
|        inductor        |          pytorch_struct           |   1.4603    |   0.0236   |
|        inductor        |          phlippe_resnet           |   1.8458    |   0.0212   |
|        inductor        |            tts_angular            |   0.9654    |   0.0212   |
|        inductor        |       functorch_dp_cifar10        |   3.5785    |   0.0212   |
|        inductor        |             resnet18              |   1.6092    |   0.0198   |
|        inductor        |        speech_transformer         |    1.608    |   0.0187   |
|        inductor        |           fastNLP_Bert            |   1.5479    |   0.0171   |
|        inductor        |          LearningToPaint          |   1.3044    |   0.0169   |
|        inductor        |              hf_Bart              |   2.2049    |   0.0081   |
|        inductor        |               dcgan               |   1.4618    |   0.0079   |
|        inductor        |            hf_Reformer            |   1.1423    |   0.0077   |
|        inductor        |           lennard_jones           |   1.4021    |   0.0069   |
|        inductor        |                drq                |   1.5508    |   0.004    |
|        inductor        |         soft_actor_critic         |   1.2102    |   0.0033   |
|        inductor        |            hf_BigBird             |   2.4758    |    0.0     |
|        inductor        |               dlrm                |   1.9481    |    0.0     |
|        inductor        |           hf_Longformer           |   1.4374    |    0.0     |
| inductor_no_cudagraphs |               dlrm                |   1.1668    |   0.9089   |
+------------------------+-----------------------------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+--------+-------------+------------+
|        compiler        |  name  | prev_status | cur_status |
+------------------------+--------+-------------+------------+
| inductor_no_cudagraphs | yolov3 |  119.7147   |  120.2883  |
+------------------------+--------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Performance speedup regressions

+----------+---------------------------------+-------------+------------+
| compiler |              name               | prev_status | cur_status |
+----------+---------------------------------+-------------+------------+
| inductor |        YituTechConvBert         |   1.4883    |   0.0275   |
| inductor |       ElectraForCausalLM        |   1.7835    |   0.0262   |
| inductor |        PLBartForCausalLM        |   1.6159    |   0.0238   |
| inductor | PegasusForConditionalGeneration |   1.3078    |   0.0215   |
| inductor |         OPTForCausalLM          |   2.4257    |   0.017    |
| inductor |        TrOCRForCausalLM         |    1.242    |   0.0133   |
| inductor |        MBartForCausalLM         |   1.4908    |   0.0108   |
| inductor |         BartForCausalLM         |   1.4897    |   0.0107   |
| inductor |     Speech2Text2ForCausalLM     |   1.4537    |   0.0095   |
| inductor |   BlenderbotSmallForCausalLM    |   1.2152    |   0.0094   |
| inductor | M2M100ForConditionalGeneration  |   1.5384    |   0.0072   |
| inductor |       PegasusForCausalLM        |   1.1606    |   0.0063   |
| inductor |         XGLMForCausalLM         |   2.0059    |   0.0044   |
| inductor | PLBartForConditionalGeneration  |   1.5913    |    0.0     |
| inductor |   DebertaForQuestionAnswering   |   1.0112    |    0.0     |
| inductor |      AllenaiLongformerBase      |   1.5928    |    0.0     |
+----------+---------------------------------+-------------+------------+

Compilation latency (sec) regressions

+----------+--------------------------------+-------------+------------+
| compiler |              name              | prev_status | cur_status |
+----------+--------------------------------+-------------+------------+
| inductor | M2M100ForConditionalGeneration |  111.9143   |  185.8243  |
| inductor |        XGLMForCausalLM         |   78.9918   |  183.5628  |
+----------+--------------------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652

Performance speedup regressions

+----------+---------------------------------+-------------+------------+
| compiler |              name               | prev_status | cur_status |
+----------+---------------------------------+-------------+------------+
| inductor |          mixer_b16_224          |   1.3642    |   0.2712   |
| inductor |        convmixer_768_32         |   1.0023    |   0.2497   |
| inductor |            pit_b_224            |   1.4342    |   0.2306   |
| inductor |        tnt_s_patch16_224        |   3.0201    |   0.2152   |
| inductor |          gmlp_s16_224           |   1.8433    |   0.212    |
| inductor |          gmixer_24_224          |   1.7622    |   0.1889   |
| inductor |           convit_base           |   1.6102    |   0.169    |
| inductor |          resmlp_12_224          |   1.2622    |   0.1366   |
| inductor |           tf_mixnet_l           |   1.1874    |   0.1241   |
| inductor |       eca_botnext26ts_256       |   1.4427    |   0.1225   |
| inductor |            mixnet_l             |   1.1759    |   0.1212   |
| inductor |          botnet26t_256          |   1.4063    |   0.1194   |
| inductor |             dla102              |   1.5216    |   0.1139   |
| inductor |         coat_lite_mini          |   1.9445    |   0.113    |
| inductor |      beit_base_patch16_224      |   1.3513    |   0.1103   |
| inductor |         visformer_small         |   1.1735    |   0.1084   |
| inductor |           dm_nfnet_f0           |   1.4732    |   0.108    |
| inductor |          inception_v3           |    1.528    |   0.1057   |
| inductor |            nfnet_l0             |   1.4868    |   0.1029   |
| inductor |      vit_base_patch16_224       |    1.235    |   0.1014   |
| inductor | deit_base_distilled_patch16_224 |   1.2552    |   0.0994   |
| inductor |           res2next50            |   1.3687    |   0.0949   |
| inductor |           volo_d1_224           |   1.6895    |   0.0948   |
| inductor |       gluon_inception_v3        |   1.5233    |   0.0919   |
| inductor |      xcit_large_24_p8_224       |   1.9817    |   0.091    |
| inductor |          convnext_base          |    1.49     |   0.0902   |
| inductor |  swin_base_patch4_window7_224   |   1.6165    |   0.087    |
| inductor |     swsl_resnext101_32x16d      |   1.0573    |   0.0845   |
| inductor |        adv_inception_v3         |   1.5239    |   0.0807   |
| inductor |          cspdarknet53           |   1.2193    |   0.0773   |
| inductor |            repvgg_a2            |   1.0826    |   0.0762   |
| inductor |            gernet_l             |   1.0361    |   0.076    |
| inductor |         poolformer_m36          |   1.3305    |   0.0759   |
| inductor |           resnest101e           |   1.4185    |   0.073    |
| inductor |           selecsls42b           |   1.4065    |   0.0729   |
| inductor |       tf_efficientnet_b0        |   1.3551    |   0.0722   |
| inductor |            hrnet_w18            |   1.3505    |   0.0709   |
| inductor |          pnasnet5large          |   1.1147    |   0.0699   |
| inductor |          jx_nest_base           |   1.3653    |   0.0673   |
| inductor |         mobilenetv2_100         |   1.3898    |   0.0661   |
| inductor |        res2net50_14w_8s         |   1.3778    |   0.0655   |
| inductor |         crossvit_9_240          |   1.6424    |   0.0654   |
| inductor |          cait_m36_384           |   1.3508    |   0.0648   |
| inductor |          ghostnet_100           |    1.834    |   0.0635   |
| inductor |            fbnetv3_b            |   1.3119    |   0.063    |
| inductor |             dpn107              |   1.0932    |   0.0622   |
| inductor |          spnasnet_100           |   1.3533    |   0.0614   |
| inductor |           rexnet_100            |   1.2935    |   0.0611   |
| inductor |        gluon_xception65         |   1.0755    |   0.0592   |
| inductor |        twins_pcpvt_base         |   1.9353    |   0.0574   |
| inductor |      mobilenetv3_large_100      |   1.4336    |   0.057    |
| inductor |            tinynet_a            |   1.2195    |   0.0546   |
| inductor |        res2net101_26w_4s        |   1.1498    |   0.0416   |
| inductor |        ese_vovnet19b_dw         |   1.3542    |   0.0407   |
| inductor |            lcnet_050            |   1.6772    |   0.0397   |
| inductor |           regnety_002           |   1.4071    |   0.0397   |
| inductor |        sebotnet33ts_256         |   1.5031    |   0.0392   |
| inductor |           mobilevit_s           |   1.4274    |   0.0343   |
| inductor |           mnasnet_100           |   1.4304    |   0.0328   |
| inductor |           fbnetc_100            |   1.3514    |   0.0282   |
+----------+---------------------------------+-------------+------------+

Compilation latency (sec) regressions

+----------+--------------+-------------+------------+
| compiler |     name     | prev_status | cur_status |
+----------+--------------+-------------+------------+
| inductor | cait_m36_384 |  114.2653   |  120.8796  |
+----------+--------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 0.9969 |  0.1781   |  1.2328  |         1.232          |
|               hf_T5               |  8   | 0.9854 |  0.8532   |  0.1782  |         1.9468         |
|             hf_Albert             |  8   | 0.9958 |  0.9597   |  0.1716  |         2.2521         |
|               vgg16               |  64  | 0.9991 |  0.9988   |  0.1509  |         1.2559         |
|        Background_Matting         |  4   | 0.9982 |  0.1362   |  0.125   |         1.2052         |
|            timm_nfnet             | 128  | 0.9859 |  0.9854   |  0.1019  |         1.4719         |
|           hf_GPT2_large           |  4   | 0.9829 |  0.9718   |  0.0971  |         1.7318         |
|           hf_Bert_large           |  4   | 0.9997 |   0.873   |  0.0968  |         1.545          |
|              hf_Bert              |  4   | 0.9971 |  0.8394   |  0.0853  |         1.5697         |
|           pytorch_unet            |  1   | 0.9971 |  0.2037   |  0.0713  |         1.3518         |
|            hf_T5_large            |  2   | 0.9762 |  0.8066   |  0.0701  |         1.8776         |
|           BERT_pytorch            |  16  | 0.991  |  0.8241   |  0.0642  |         2.1226         |
|              yolov3               |  16  | 0.997  |   0.804   |  0.057   |         1.1963         |
|           mobilenet_v2            |  96  | 0.997  |   0.776   |  0.0553  |         1.4048         |
| attention_is_all_you_need_pytorch | 256  | 0.991  |  0.9264   |  0.055   |         1.4406         |
|              hf_GPT2              |  4   | 0.9937 |  0.9591   |  0.0543  |         1.7699         |
|           hf_DistilBert           |  8   | 0.9821 |  0.9453   |  0.0474  |         1.4553         |
|            timm_regnet            |  32  | 0.9179 |  0.7743   |  0.0469  |         0.9653         |
|              demucs               |  4   | 0.9997 |   1.001   |  0.0409  |         1.037          |
|      timm_vision_transformer      |  32  | 0.9922 |  0.8827   |  0.0363  |         1.3786         |
|        shufflenet_v2_x1_0         | 128  | 0.9942 |  0.7513   |  0.0356  |         1.1985         |
|           timm_resnest            |  32  | 0.9938 |  0.8518   |  0.0352  |         1.5021         |
|            densenet121            |  4   | 0.9853 |  0.7051   |  0.0344  |         1.0521         |
|             resnet152             |  32  | 0.9948 |  0.7514   |  0.0342  |         1.0303         |
|             resnet50              |  32  | 0.9984 |  0.7834   |  0.0332  |         1.0749         |
|          pytorch_stargan          |  16  | 0.9928 |  0.8011   |  0.0317  |         1.2303         |
|            timm_vovnet            |  32  | 0.8568 |  0.7114   |  0.0315  |         0.9222         |
|         timm_efficientnet         |  32  | 0.9387 |  0.6254   |  0.0311  |         1.0899         |
|         phlippe_densenet          | 128  | 0.9846 |  0.7712   |  0.0306  |         1.0207         |
|        mobilenet_v3_large         |  32  | 0.9971 |  0.7784   |  0.0305  |         1.1944         |
|            mnasnet1_0             |  32  | 0.9889 |   0.734   |  0.0285  |         1.0384         |
|          resnext50_32x4d          |  8   | 0.9833 |  0.7189   |  0.0275  |         0.993          |
|      nvidia_deeprecommender       | 256  | 0.999  |  0.9989   |  0.0268  |         1.0178         |
|              alexnet              | 128  | 0.9987 |  0.9975   |  0.0241  |         1.1363         |
|           squeezenet1_1           |  32  | 0.9836 |  0.9375   |  0.0237  |         1.326          |
|          pytorch_struct           | 200  | 0.9276 |  0.7689   |  0.0236  |         1.104          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9804 |  0.9018   |  0.0236  |         1.7792         |
|          phlippe_resnet           | 128  | 0.9858 |  0.7653   |  0.0212  |         1.001          |
|       functorch_dp_cifar10        |  64  | 0.9608 |  0.9164   |  0.0212  |         1.374          |
|            tts_angular            |  64  | 0.9248 |  0.8973   |  0.0212  |         0.9552         |
|             resnet18              |  16  | 0.9864 |  0.7533   |  0.0198  |         0.9759         |
|        speech_transformer         |  32  | 0.9808 |  0.7975   |  0.0187  |          1.6           |
|           fastNLP_Bert            |  6   | 0.9951 |  0.8038   |  0.0171  |         1.5158         |
|          LearningToPaint          |  96  | 0.9893 |  0.7806   |  0.0169  |         1.0611         |
|              hf_Bart              |  4   | 0.9741 |  0.7768   |  0.0081  |         1.427          |
|               dcgan               |  32  | 0.8598 |  0.6975   |  0.0079  |         0.8273         |
|            hf_Reformer            |  4   | 0.987  |   0.964   |  0.0077  |         1.0585         |
|           lennard_jones           | 1000 | 0.8397 |  0.7563   |  0.0069  |         0.8859         |
|                drq                |  1   | 0.9593 |  0.7486   |  0.004   |         1.0688         |
|         soft_actor_critic         | 256  | 0.8577 |  0.6208   |  0.0033  |         0.8243         |
|   timm_vision_transformer_large   |  32  | 0.9978 |    0.0    |   0.0    |         1.0806         |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|            hf_BigBird             |  2   | 0.9525 |  0.7792   |   0.0    |         1.6246         |
|               moco                |  32  | 0.9792 |    0.0    |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9331 |  0.8524   |   0.0    |         0.9089         |
|           hf_Longformer           |  2   | 0.828  |  0.5698   |   0.0    |         1.2684         |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 26.8656 |  56.2575  | 185.444  |        172.6575        |
|         phlippe_densenet          | 128  | 3.2897  |  7.0093   | 131.4983 |        166.721         |
|            densenet121            |  4   | 7.7634  |  19.1984  | 127.3627 |        136.5742        |
|         timm_efficientnet         |  32  | 5.0164  |  10.2296  | 123.0083 |        144.7696        |
|        mobilenet_v3_large         |  32  | 3.4623  |  8.1338   | 115.9145 |        138.3228        |
|           hf_GPT2_large           |  4   | 14.8622 |   30.26   | 113.9952 |        105.2613        |
|              yolov3               |  16  | 4.8837  |  10.9762  | 106.1824 |        120.2883        |
|           mobilenet_v2            |  96  | 3.1101  |  6.9927   | 105.6331 |        123.9663        |
|             resnet152             |  32  | 9.1929  |  20.1438  | 104.8346 |        106.6239        |
|              hf_Bart              |  4   | 10.4845 |  18.0953  | 94.9982  |        60.5299         |
|            mnasnet1_0             |  32  | 3.1765  |  6.7429   | 91.5663  |        103.5322        |
|            hf_Reformer            |  4   | 4.1079  |   6.002   | 90.7179  |         39.748         |
|        speech_transformer         |  32  | 6.0365  |  14.5661  | 90.5651  |        78.1963         |
|           timm_resnest            |  32  | 1.8609  |  3.9501   | 82.7733  |        99.4372         |
| attention_is_all_you_need_pytorch | 256  | 4.3978  |  10.8437  | 77.2843  |        73.7612         |
|        shufflenet_v2_x1_0         | 128  | 3.4637  |  7.7363   | 72.3908  |         80.935         |
|            timm_regnet            |  32  | 6.7283  |  12.4844  |  71.872  |        73.2747         |
|           BERT_pytorch            |  16  | 4.9527  |  11.6569  | 70.6407  |        67.8655         |
|            timm_nfnet             | 128  | 5.8825  |  11.096   | 70.3388  |        72.2029         |
|           hf_Bert_large           |  4   | 10.323  |  21.4066  | 67.4346  |        63.0485         |
|        Background_Matting         |  4   | 3.1808  |  11.4793  | 63.3945  |        67.5314         |
|             resnet50              |  32  | 3.2272  |  6.9769   | 61.3379  |         66.38          |
|           fastNLP_Bert            |  6   | 5.2004  |  11.0234  | 60.9826  |        49.3485         |
|            timm_vovnet            |  32  | 3.6145  |  6.3657   | 57.2036  |        63.1074         |
|               hf_T5               |  8   | 5.6077  |  13.5818  | 52.7004  |        50.7205         |
|           pytorch_unet            |  1   | 1.5183  |  4.4323   |  52.266  |         60.26          |
|      timm_vision_transformer      |  32  | 3.3147  |  7.6801   | 51.6939  |        49.7888         |
|          resnext50_32x4d          |  8   | 3.2335  |  6.9474   | 51.4273  |        51.5949         |
|       functorch_dp_cifar10        |  64  |  1.199  |  2.3958   | 45.4067  |        55.5902         |
|              hf_GPT2              |  4   | 4.8209  |   9.615   | 44.2912  |        41.0971         |
|            Super_SloMo            |  6   | 2.7523  |  9.7552   | 43.0325  |        42.0304         |
|          pytorch_stargan          |  16  | 1.1983  |  3.2106   | 41.8016  |        46.2186         |
|          LearningToPaint          |  96  | 1.4034  |  2.8969   | 41.5345  |        44.8185         |
|             resnet18              |  16  | 1.3463  |  2.8533   | 40.2368  |        43.7652         |
|              hf_Bert              |  4   | 5.0854  |  10.5881  | 39.8591  |        38.0455         |
|             hf_Albert             |  8   | 2.4878  |  8.6454   | 39.6829  |        38.0564         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2321  |  2.9344   | 35.9846  |        35.5051         |
|              demucs               |  4   | 1.4298  |  2.1811   | 34.2076  |        29.5824         |
|           hf_DistilBert           |  8   | 2.3716  |  5.6399   | 31.7708  |        31.8413         |
|          phlippe_resnet           | 128  | 1.4109  |  2.8369   | 30.0658  |        32.1938         |
|           squeezenet1_1           |  32  |  1.049  |   1.757   | 24.2426  |        23.9354         |
|          pytorch_struct           | 200  | 0.7446  |   1.328   |  20.595  |        18.8508         |
|               vgg16               |  64  | 0.6232  |  1.1078   | 16.4732  |        15.4457         |
|              alexnet              | 128  |  0.481  |  0.7793   |  16.303  |        15.7158         |
|                drq                |  1   | 0.6536  |  1.0277   | 15.9861  |         9.1019         |
|      nvidia_deeprecommender       | 256  | 0.4828  |   0.775   | 11.5007  |         9.3492         |
|         soft_actor_critic         | 256  | 0.4179  |  0.6072   | 10.4633  |         6.7939         |
|               dcgan               |  32  | 0.4304  |  0.7114   |  9.5905  |         7.583          |
|           lennard_jones           | 1000 | 0.3914  |  0.6021   |  7.7451  |         5.7657         |
|            tts_angular            |  64  | 0.4481  |  0.5222   |  7.3641  |         5.8642         |
|            hf_BigBird             |  2   | 12.9756 |  37.1937  |   nan    |        125.7931        |
|   timm_vision_transformer_large   |  32  | 9.4362  |    nan    |   nan    |        125.1634        |
|           hf_Longformer           |  2   | 11.3458 |  31.445   |   nan    |        118.5972        |
|               dlrm                | 1024 | 0.3737  |  0.7873   |   nan    |         7.3889         |
|               moco                |  32  | 27.7132 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.2086  |         1.2037         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  1.193   |         1.1717         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.1751  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.1728  |         1.1719         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  1.1687  |         1.168          |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  1.1296  |         1.1266         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  1.1286  |         1.1284         |
|           mobilenet_v2            |  96  | 0.9857 |  0.7651   |  1.108   |         1.1019         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  1.1053  |         0.9973         |
|            timm_nfnet             | 128  | 0.9071 |  0.8752   |  1.077   |         1.073          |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  1.0737  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  1.0736  |         1.0718         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  1.0687  |         0.9997         |
|                drq                |  1   | 0.9877 |  0.8852   |  1.0607  |         0.9573         |
|        Background_Matting         |  4   | 1.0127 |  0.6486   |  1.042   |         1.0403         |
|              yolov3               |  16  | 0.9838 |  0.8252   |  1.038   |         1.012          |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  1.0344  |         1.0258         |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  1.0249  |         0.9957         |
|         timm_efficientnet         |  32  | 0.985  |  0.8179   |  1.0123  |         0.9411         |
|               vgg16               |  64  | 0.9919 |  0.7243   |  0.9823  |         0.9805         |
|        shufflenet_v2_x1_0         | 128  | 0.9539 |  0.8383   |  0.9691  |         0.9658         |
|           timm_resnest            |  32  | 0.9885 |  0.8972   |  0.9686  |         0.9617         |
|              demucs               |  4   | 0.9663 |  0.9664   |  0.9678  |         0.9662         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.9645  |         0.9645         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.9564  |         0.9479         |
|            timm_regnet            |  32  | 0.995  |  0.8499   |  0.953   |         0.9529         |
|             resnet152             |  32  | 0.995  |  0.8935   |  0.9439  |         0.9411         |
|              alexnet              | 128  | 0.9455 |   0.793   |  0.9432  |         0.9385         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.9358  |         0.9285         |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.9306  |         0.9308         |
|           squeezenet1_1           |  32  | 0.9674 |  0.9291   |   0.91   |         0.9087         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.8951  |         0.8931         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.8934  |         0.8893         |
|             resnet50              |  32  | 0.9916 |  0.8637   |  0.8898  |         0.8844         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.889   |         0.8869         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8873  |         0.8835         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8834  |         0.8659         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8694  |         0.869          |
|            densenet121            |  4   | 0.9959 |  0.9833   |  0.8268  |         0.8034         |
|            mnasnet1_0             |  32  | 0.9757 |  0.8618   |  0.813   |         0.779          |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.8064  |         0.8022         |
|        mobilenet_v3_large         |  32  | 0.9792 |  0.9436   |  0.7825  |         0.8709         |
|          resnext50_32x4d          |  8   | 0.9934 |  0.8422   |  0.7779  |         0.7707         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.7552  |         0.7463         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7424  |         0.7358         |
|             resnet18              |  16  | 0.9751 |  0.7996   |  0.6222  |         0.6127         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8568   |  0.6035  |         0.6004         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.451   |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3554  |         0.3395         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |   nan    |         1.1013         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |         1.0009         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.8565 |  0.8296   |   nan    |         0.8951         |
|               moco                |  32  | 0.9954 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+------------+------------------------+
|               name                |  bs  |  eager   | aot_eager |  inductor  | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+------------+------------------------+
|            hf_Reformer            |  4   | 81.9793  |  84.1305  | 10645.2694 |        76.6409         |
|              hf_Bart              |  4   |  58.983  |  98.2367  | 8215.7201  |        41.4683         |
|            hf_T5_large            |  2   | 225.8202 | 273.1034  | 3271.9146  |        119.7839        |
|           fastNLP_Bert            |  6   | 53.0306  |  69.5066  | 3193.4304  |        34.7647         |
|        speech_transformer         |  32  | 67.7967  |  79.8511  | 3167.7298  |        35.4311         |
|           hf_GPT2_large           |  4   | 213.1056 | 215.1045  | 2165.6649  |        120.9718        |
|             resnet152             |  32  | 64.7505  |  87.3526  | 1956.0883  |        61.1097         |
|            densenet121            |  4   | 55.2979  |  80.2153  | 1702.2063  |        52.5088         |
|              demucs               |  4   | 53.7008  |  54.0141  | 1310.1686  |        51.8592         |
|                drq                |  1   |  3.4511  |  4.4857   |  1309.572  |         2.9678         |
|            timm_regnet            |  32  | 61.2326  |  72.0004  | 1214.0437  |        57.9942         |
|              yolov3               |  16  | 69.0418  |  85.3605  | 1213.1718  |        57.5334         |
|            timm_nfnet             | 128  |  119.98  | 120.5511  | 1168.8574  |        80.4417         |
|         timm_efficientnet         |  32  | 33.7823  |  51.5374  | 1112.9952  |         29.623         |
| attention_is_all_you_need_pytorch | 256  | 55.7056  |  58.4279  | 1027.0515  |        37.4309         |
|               hf_T5               |  8   | 181.5801 |  212.394  | 1013.1437  |        92.2724         |
|        Background_Matting         |  4   | 126.0541 | 925.2006  | 1011.8538  |        104.6293        |
|        mobilenet_v3_large         |  32  | 27.1309  |  36.4298  |  954.0521  |        22.2399         |
|              hf_GPT2              |  4   | 49.3752  |  50.8724  |  919.2748  |        27.5998         |
|        shufflenet_v2_x1_0         | 128  | 30.9129  |  42.1902  |  906.858   |        25.9221         |
|           BERT_pytorch            |  16  | 55.0973  |  81.5113  |  893.6226  |        24.8841         |
|           hf_Bert_large           |  4   | 82.7451  |  94.4653  |  876.4682  |        53.9107         |
|           mobilenet_v2            |  96  | 47.2121  |  60.6022  |  856.9697  |        33.5737         |
|            mnasnet1_0             |  32  | 22.4553  |  31.6363  |  850.0939  |        22.4974         |
|      timm_vision_transformer      |  32  | 31.4992  |  37.9694  |  842.9689  |        20.5033         |
|         phlippe_densenet          | 128  | 24.1399  |  30.2344  |  842.8085  |        23.1723         |
|             resnet50              |  32  | 26.4958  |  33.2444  |  841.2372  |        24.2124         |
|          resnext50_32x4d          |  8   | 22.2757  |  27.1484  |  836.6934  |        19.6706         |
|            timm_vovnet            |  32  |  28.929  |  34.8621  |  836.6727  |        26.8927         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.6795  |  14.7272  |  783.529   |         7.5412         |
|          LearningToPaint          |  96  | 11.4581  |  14.2898  |  730.4538  |        10.7154         |
|         soft_actor_critic         | 256  |  1.6796  |   2.412   |  719.5636  |         1.9273         |
|           timm_resnest            |  32  | 24.2399  |  28.2971  |  691.5819  |        16.0495         |
|           hf_DistilBert           |  8   | 32.0868  |  35.4917  |  678.076   |        22.0573         |
|           pytorch_unet            |  1   | 40.0328  | 195.4996  |  563.3342  |        29.5103         |
|       functorch_dp_cifar10        |  64  | 10.4152  |  11.0746  |  556.3794  |         7.4911         |
|             resnet18              |  16  |  9.326   |  12.8379  |  541.3552  |         9.509          |
|              hf_Bert              |  4   | 40.7313  |  47.9412  |  504.8922  |        25.9005         |
|          pytorch_stargan          |  16  | 14.8628  |  18.3439  |  495.9537  |        11.8983         |
|          phlippe_resnet           | 128  |  9.5225  |  11.7663  |  487.3425  |         8.9714         |
|           squeezenet1_1           |  32  | 10.5296  |  11.684   |  472.4366  |         7.6989         |
|               vgg16               |  64  | 66.3726  |  66.4514  |  441.1606  |        52.7991         |
|              alexnet              | 128  |  9.8266  |  9.8615   |  411.9276  |         8.6489         |
|             hf_Albert             |  8   | 68.6561  |  72.7009  |  401.1282  |        30.3082         |
|      nvidia_deeprecommender       | 256  | 10.2504  |  10.2533  |  384.4942  |        10.0522         |
|               dcgan               |  32  |  2.3576  |  3.0397   |  369.806   |         2.5296         |
|           lennard_jones           | 1000 |  1.7878  |   2.165   |  325.3069  |         1.7648         |
|            tts_angular            |  64  |  6.747   |  6.9684   |  321.3089  |         6.5603         |
|          pytorch_struct           | 200  |  5.0267  |  6.0392   |  221.4068  |         4.2608         |
|            Super_SloMo            |  6   | 79.7566  | 446.5508  |  64.4588   |        64.4434         |
|   timm_vision_transformer_large   |  32  | 465.6706 |    nan    |    nan     |        430.0889        |
|            hf_BigBird             |  2   | 204.9226 |  244.869  |    nan     |        118.6609        |
|           hf_Longformer           |  2   | 137.709  | 196.6524  |    nan     |        89.1435         |
|               dlrm                | 1024 |  4.4098  |  4.9253   |    nan     |         4.6254         |
|               moco                |  32  | 51.4606  |    nan    |    nan     |          nan           |
|                gat                |  0   |   nan    |    nan    |    nan     |          nan           |
|                gcn                |  0   |   nan    |    nan    |    nan     |          nan           |
|               sage                |  0   |   nan    |    nan    |    nan     |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |    nan     |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |    nan     |          nan           |
+-----------------------------------+------+----------+-----------+------------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|      GPT2ForSequenceClassification      |  4  | 0.978  |  0.9526   |  2.2921  |         2.2843         |
|          MobileBertForMaskedLM          | 64  | 0.9532 |  0.8129   |  2.2703  |         1.0803         |
|       MT5ForConditionalGeneration       | 16  | 0.9904 |  0.8416   |  2.1691  |         1.8457         |
|       ElectraForQuestionAnswering       | 64  | 0.9866 |  0.9769   |  2.1009  |         2.0898         |
|     MobileBertForQuestionAnswering      | 128 | 0.9544 |  0.8146   |  2.0835  |         1.0711         |
|    LayoutLMForSequenceClassification    | 16  | 0.9846 |  0.9707   |  1.8153  |         1.7725         |
|            XLNetLMHeadModel             |  8  | 0.9948 |  0.9656   |  1.8116  |         1.814          |
|       RobertaForQuestionAnswering       | 16  | 0.9846 |  0.9698   |  1.7684  |         1.766          |
|        BertForQuestionAnswering         | 16  | 0.9847 |   0.97    |  1.7605  |         1.7623         |
|                 T5Small                 |  4  | 0.9772 |  0.8589   |  1.7518  |         1.7354         |
|       T5ForConditionalGeneration        |  4  | 0.9827 |  0.8565   |  1.7481  |         1.7308         |
|               DistillGPT2               | 16  | 0.9881 |  0.9551   |  1.665   |          1.7           |
|           RobertaForCausalLM            | 16  | 0.9873 |  0.9629   |  1.6623  |         1.6661         |
|       AlbertForQuestionAnswering        |  4  | 0.9999 |  0.8852   |  1.6521  |         1.6548         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9805 |  0.9617   |  1.6485  |         1.6281         |
|            AlbertForMaskedLM            |  4  | 0.9996 |  0.8848   |  1.6421  |         1.6462         |
|           LayoutLMForMaskedLM           | 16  | 0.9861 |  0.9619   |  1.5872  |         1.5933         |
|             BertForMaskedLM             | 16  | 0.986  |  0.9608   |  1.5836  |         1.5844         |
|                CamemBert                | 16  | 0.9874 |  0.9633   |  1.535   |         1.5337         |
|      BartForConditionalGeneration       |  2  | 0.9992 |  0.9627   |  1.5263  |         1.5532         |
|         MegatronBertForCausalLM         |  4  | 0.9912 |  0.9234   |  1.5252  |         1.4874         |
|      MBartForConditionalGeneration      |  2  | 1.0039 |  0.9618   |  1.5125  |         1.4713         |
|     DistilBertForQuestionAnswering      | 256 | 0.994  |  0.9865   |  1.4564  |         1.4481         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0044 |  0.9121   |  1.4339  |         1.4636         |
|          DistilBertForMaskedLM          | 128 | 0.9925 |  0.9505   |  1.2164  |         1.2346         |
|            YituTechConvBert             | 16  | 0.9862 |  0.9551   |  0.0275  |         1.4722         |
|           ElectraForCausalLM            | 32  | 0.9822 |  0.9349   |  0.0262  |         1.8147         |
|            PLBartForCausalLM            |  8  | 0.9895 |   0.959   |  0.0238  |         1.6708         |
|     PegasusForConditionalGeneration     | 32  | 0.9977 |  0.9208   |  0.0215  |         1.2992         |
|             OPTForCausalLM              |  2  | 0.9869 |   0.932   |  0.017   |         2.4796         |
|            TrOCRForCausalLM             | 32  | 0.9909 |  0.9611   |  0.0133  |         1.2866         |
|            MBartForCausalLM             |  4  | 0.9884 |  0.9595   |  0.0108  |         1.535          |
|             BartForCausalLM             |  4  | 0.9895 |  0.9604   |  0.0107  |         1.5417         |
|         Speech2Text2ForCausalLM         | 256 | 0.9811 |  0.9311   |  0.0095  |         1.5251         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9571 |  0.8897   |  0.0094  |         1.2051         |
|     M2M100ForConditionalGeneration      | 16  | 1.0326 |  0.8052   |  0.0072  |         1.3753         |
|           PegasusForCausalLM            | 32  | 0.9482 |   0.869   |  0.0063  |         1.1519         |
|             XGLMForCausalLM             |  8  | 0.9376 |  0.7319   |  0.0044  |         1.2264         |
|          BlenderbotForCausalLM          |  4  | 0.9346 |  0.7346   |  0.0042  |         1.1045         |
|          AllenaiLongformerBase          |  4  | 0.8861 |  0.6284   |   0.0    |         1.5002         |
|     PLBartForConditionalGeneration      |  4  | 0.9851 |   0.93    |   0.0    |         1.6282         |
|       DebertaForQuestionAnswering       |  8  | 0.8058 |   0.697   |   0.0    |         0.9179         |
|           DebertaForMaskedLM            |  4  | 0.7503 |  0.5832   |   0.0    |         0.777          |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6978 |  0.5244   |   0.0    |         0.6372         |
|          DebertaV2ForMaskedLM           |  1  | 0.6976 |  0.5253   |   0.0    |         0.6246         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|     M2M100ForConditionalGeneration      | 16  | 12.2515 |  25.6317  | 185.8243 |        106.222         |
|             XGLMForCausalLM             |  8  | 9.6537  |  20.9327  | 183.5628 |        71.2363         |
|          BlenderbotForCausalLM          |  4  | 11.6935 |  22.5954  | 179.5094 |        70.6044         |
|          MobileBertForMaskedLM          | 64  | 17.9354 |  40.8902  | 149.183  |        143.9504        |
|     MobileBertForQuestionAnswering      | 128 | 17.8236 |  42.4852  | 142.1606 |        137.1101        |
|       MT5ForConditionalGeneration       | 16  | 8.4468  |  19.6675  | 128.5388 |        132.3612        |
|     PegasusForConditionalGeneration     | 32  | 5.1751  |  19.4773  | 102.3528 |         73.558         |
|           PegasusForCausalLM            | 32  | 5.9201  |  11.5258  | 95.1468  |        43.3986         |
|            YituTechConvBert             | 16  | 11.3375 |  20.7449  |  94.154  |        75.9294         |
|            XLNetLMHeadModel             |  8  | 10.7189 |  27.8803  | 92.6831  |        91.6401         |
|             BartForCausalLM             |  4  | 6.3724  |  11.8915  | 91.7996  |        42.9386         |
|            MBartForCausalLM             |  4  | 6.6985  |  11.8776  | 90.3292  |        45.6405         |
|            TrOCRForCausalLM             | 32  | 6.5867  |  12.0769  | 88.1947  |        42.7272         |
|             OPTForCausalLM              |  2  | 5.5123  |  11.2889  | 84.4689  |        39.9596         |
|      MBartForConditionalGeneration      |  2  | 11.9072 |  26.1733  | 84.4591  |        78.5979         |
|      BartForConditionalGeneration       |  2  | 12.0974 |  26.2093  |  78.227  |        74.3847         |
|           ElectraForCausalLM            | 32  | 8.2561  |  13.4945  | 77.9792  |        63.2513         |
|    MegatronBertForQuestionAnswering     |  8  | 10.5534 |  21.6056  |  69.756  |        65.6559         |
|         MegatronBertForCausalLM         |  4  | 10.805  |  21.7081  | 68.9162  |        64.9945         |
|       BlenderbotSmallForCausalLM        | 64  | 4.8074  |  8.2771   | 64.8565  |        36.9069         |
|         Speech2Text2ForCausalLM         | 256 | 3.3978  |  6.4424   | 58.3187  |        32.6018         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7551  |  17.0989  | 56.0703  |        54.4222         |
|            PLBartForCausalLM            |  8  | 3.7665  |  6.7774   | 54.4205  |        32.0002         |
|                 T5Small                 |  4  | 5.8687  |  13.4404  | 51.6838  |        49.7791         |
|       T5ForConditionalGeneration        |  4  | 5.9551  |  13.408   | 51.2529  |        49.7838         |
|    LayoutLMForSequenceClassification    | 16  | 5.8577  |  11.3333  | 47.0539  |        44.7519         |
|       ElectraForQuestionAnswering       | 64  | 5.5384  |  10.8813  | 44.4107  |        42.6548         |
|           LayoutLMForMaskedLM           | 16  | 5.8253  |  11.2108  | 41.4702  |        38.7903         |
|             BertForMaskedLM             | 16  | 5.3071  |  10.9586  | 39.6356  |        38.7565         |
|        BertForQuestionAnswering         | 16  |  5.24   |  10.6261  | 39.1996  |        37.8984         |
|           RobertaForCausalLM            | 16  | 5.5696  |  11.5293  | 37.5458  |        36.1168         |
|                CamemBert                | 16  | 5.3655  |  10.7673  | 37.4038  |        37.9152         |
|            AlbertForMaskedLM            |  4  | 2.3992  |  8.2009   | 37.1384  |        36.5534         |
|      GPT2ForSequenceClassification      |  4  | 4.8854  |   9.934   | 36.8664  |        35.6484         |
|       RobertaForQuestionAnswering       | 16  | 5.3079  |  11.419   | 36.5028  |        34.7173         |
|     DistilBertForQuestionAnswering      | 256 | 2.4873  |  5.4579   | 35.4263  |        35.0574         |
|          DistilBertForMaskedLM          | 128 | 2.5199  |  5.3808   |  34.467  |        34.3313         |
|       AlbertForQuestionAnswering        |  4  | 2.3685  |  8.1112   | 33.9943  |        33.2105         |
|               DistillGPT2               | 16  | 2.7595  |  5.2285   | 29.0485  |        28.0987         |
|          AllenaiLongformerBase          |  4  | 11.5644 |  31.9664  |   nan    |        116.4837        |
|          DebertaV2ForMaskedLM           |  1  | 15.4784 |  26.5156  |   nan    |        68.4806         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.4043 |  26.4943  |   nan    |        65.3532         |
|     PLBartForConditionalGeneration      |  4  | 9.4517  |  16.9158  |   nan    |        58.8038         |
|       DebertaForQuestionAnswering       |  8  | 7.2219  |  13.316   |   nan    |        52.9487         |
|           DebertaForMaskedLM            |  4  | 7.3676  |  13.7932  |   nan    |        52.6949         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  1.3156  |         1.3147         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  1.2697  |         1.268          |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1962  |         1.195          |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.1782  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.1778  |         1.1724         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1562  |         1.2307         |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9252   |  1.1119  |         1.1099         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0902  |         1.1813         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0902  |         1.1813         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0897  |         1.1368         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0605  |         1.1479         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0562  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.056   |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0532  |         1.0491         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  1.044   |         1.1152         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0104  |         1.0518         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8957   |  1.0086  |         1.0074         |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |  0.9959  |         0.9941         |
|            YituTechConvBert             | 16  | 0.953  |  0.8732   |  0.9922  |         0.9905         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.971   |         1.0642         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.9653  |         1.0962         |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.9548  |         0.9535         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8924   |  0.9519  |         0.9507         |
|             BartForCausalLM             |  4  | 0.951  |  0.8923   |  0.9443  |         0.943          |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |   0.93   |         0.9287         |
|            PLBartForCausalLM            |  8  | 0.9219 |  0.8182   |  0.9273  |         0.9249         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.9273  |         1.0307         |
|           PegasusForCausalLM            | 32  | 0.9238 |  0.8421   |  0.927   |         0.9252         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.9136  |         1.0139         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9128  |         1.0019         |
|            TrOCRForCausalLM             | 32  |  0.92  |  0.8307   |  0.9085  |         0.9075         |
|           ElectraForCausalLM            | 32  | 0.9161 |  0.7864   |  0.8953  |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8872  |         0.9624         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8749  |         0.9803         |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.8333  |         0.8318         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.8112  |         1.016          |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7545   |  0.8097  |         0.808          |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6659  |         0.8392         |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0537   |   nan    |         1.1526         |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9143   |   nan    |         0.9978         |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8779   |   nan    |         0.9847         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9763   |   nan    |         0.9801         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |   nan    |         0.9665         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |   nan    |         0.8742         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+------------+------------------------+
|                  name                   | bs  |  eager   | aot_eager |  inductor  | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+------------+------------------------+
|             XGLMForCausalLM             |  8  | 125.5364 | 145.2171  | 27773.4628 |        80.1028         |
|          BlenderbotForCausalLM          |  4  | 126.2366 | 144.5581  | 27626.003  |        106.0798        |
|     M2M100ForConditionalGeneration      | 16  | 140.1651 | 136.0617  | 19866.6785 |        80.9067         |
|           PegasusForCausalLM            | 32  | 78.2674  |  86.3428  | 11549.5555 |        65.3277         |
|             BartForCausalLM             |  4  | 114.9758 | 119.0723  | 10797.0669 |        74.2752         |
|            MBartForCausalLM             |  4  | 115.2417 | 119.2325  | 10632.8702 |        73.9164         |
|            TrOCRForCausalLM             | 32  | 138.5399 | 142.7185  | 10433.383  |        107.2738        |
|             OPTForCausalLM              |  2  | 170.0129 | 183.8188  | 10075.5749 |         68.708         |
|     PegasusForConditionalGeneration     | 32  | 146.769  | 172.6309  | 6752.7751  |        116.1677        |
|       BlenderbotSmallForCausalLM        | 64  | 64.5187  |  65.453   | 6489.6523  |         47.958         |
|         Speech2Text2ForCausalLM         | 256 | 54.0609  |  56.4435  | 5724.6369  |         35.194         |
|            PLBartForCausalLM            |  8  | 114.3877 | 122.1441  | 4895.9591  |        69.7518         |
|            YituTechConvBert             | 16  | 128.2709 | 131.9683  | 4596.6023  |        85.4327         |
|           ElectraForCausalLM            | 32  |  90.22   |  94.1741  | 3395.2832  |        48.5369         |
|            AlbertForMaskedLM            |  4  | 266.9978 | 301.1931  |  162.4022  |        161.6306        |
|       AlbertForQuestionAnswering        |  4  | 264.8113 | 298.6212  |  160.2337  |        159.5716        |
|            XLNetLMHeadModel             |  8  | 282.1253 | 290.4914  |  154.4357  |        154.286         |
|      MBartForConditionalGeneration      |  2  | 150.5527 | 144.8202  |  94.5984   |        93.5005         |
|      BartForConditionalGeneration       |  2  | 150.768  | 143.5792  |  93.7135   |        96.4367         |
|    MegatronBertForQuestionAnswering     |  8  | 144.9318 | 147.8161  |  86.0708   |         87.404         |
|     MobileBertForQuestionAnswering      | 128 | 200.4259 | 236.7381  |  81.3228   |        164.6609        |
| BlenderbotSmallForConditionalGeneration | 64  | 114.3854 | 123.7333  |  80.4006   |        79.7921         |
|                CamemBert                | 16  | 120.1408 | 122.9082  |  77.2988   |        77.3913         |
|          MobileBertForMaskedLM          | 64  | 204.074  | 217.7357  |  75.7694   |        164.7453        |
|           LayoutLMForMaskedLM           | 16  | 114.201  | 117.2233  |  70.9437   |        70.7925         |
|     DistilBertForQuestionAnswering      | 256 | 103.8667 | 104.6353  |  70.8487   |        71.3143         |
|          DistilBertForMaskedLM          | 128 | 85.2196  |  89.0689  |  69.6087   |        68.5673         |
|             BertForMaskedLM             | 16  | 111.9346 | 114.5863  |  69.5286   |        69.4114         |
|           RobertaForCausalLM            | 16  | 117.1568 | 121.0352  |  69.3914   |        69.1216         |
|               DistillGPT2               | 16  | 107.3995 | 110.8292  |  63.5469   |        62.2711         |
|       T5ForConditionalGeneration        |  4  | 109.0284 | 124.6261  |  60.0086   |        60.2259         |
|                 T5Small                 |  4  | 107.4396 | 124.4652  |  59.8614   |        60.2355         |
|         MegatronBertForCausalLM         |  4  | 88.8753  |  94.4711  |  56.8759   |        58.2258         |
|       ElectraForQuestionAnswering       | 64  | 116.7446 | 117.3478  |  54.6008   |        54.8669         |
|        BertForQuestionAnswering         | 16  | 96.9564  |  98.1278  |  54.2549   |        54.0824         |
|       RobertaForQuestionAnswering       | 16  | 97.4086  |  98.9501  |  54.2371   |        54.2584         |
|    LayoutLMForSequenceClassification    | 16  | 99.7419  | 100.7243  |  53.8086   |        55.1539         |
|       MT5ForConditionalGeneration       | 16  | 103.8599 | 121.5885  |  42.2481   |        50.1159         |
|      GPT2ForSequenceClassification      |  4  | 93.8106  |  96.1963  |  39.8512   |        40.1397         |
|      DebertaV2ForQuestionAnswering      |  2  | 152.6057 |  200.686  |    nan     |        167.6259        |
|          DebertaV2ForMaskedLM           |  1  | 148.5212 | 198.4331  |    nan     |        165.7416        |
|          AllenaiLongformerBase          |  4  | 204.5185 | 289.8264  |    nan     |        120.794         |
|           DebertaForMaskedLM            |  4  | 93.8101  | 120.8452  |    nan     |        83.8211         |
|       DebertaForQuestionAnswering       |  8  | 93.9956  | 108.8577  |    nan     |        82.5459         |
|     PLBartForConditionalGeneration      |  4  | 122.6795 | 128.2125  |    nan     |        73.1527         |
+-----------------------------------------+-----+----------+-----------+------------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          mixer_b16_224          | 128 | 0.9974 |  1.0186   |  0.2712  |         1.3626         |
|        convmixer_768_32         | 32  | 0.9988 |  0.9636   |  0.2497  |         1.0023         |
|            pit_b_224            | 64  | 0.995  |  0.9926   |  0.2306  |         1.4277         |
|        tnt_s_patch16_224        | 128 | 0.9982 |  0.9969   |  0.2152  |         2.9814         |
|          gmlp_s16_224           | 128 | 0.9954 |  1.0823   |  0.212   |         1.828          |
|          gmixer_24_224          | 128 | 0.9954 |  0.8898   |  0.1889  |         1.7498         |
|           convit_base           | 64  | 0.9983 |  0.9976   |  0.169   |         1.6085         |
|          resmlp_12_224          | 128 | 0.993  |  0.8888   |  0.1366  |         1.2589         |
|           tf_mixnet_l           | 128 | 0.9759 |  0.8254   |  0.1241  |         1.1915         |
|       eca_botnext26ts_256       | 128 | 0.973  |  0.7189   |  0.1225  |         1.4234         |
|            mixnet_l             | 128 | 0.9768 |  0.8202   |  0.1212  |         1.1823         |
|          botnet26t_256          | 128 | 0.974  |  0.8488   |  0.1194  |         1.4204         |
|             dla102              | 128 | 0.9958 |  0.8139   |  0.1139  |          1.52          |
|         coat_lite_mini          | 128 | 0.9969 |  0.9956   |  0.113   |         1.9251         |
|      beit_base_patch16_224      | 64  | 0.9966 |  0.9477   |  0.1103  |         1.3511         |
|         visformer_small         | 128 | 0.9959 |  0.9439   |  0.1084  |         1.1662         |
|           dm_nfnet_f0           | 128 | 0.9869 |  0.9856   |  0.108   |         1.4276         |
|          inception_v3           | 128 | 0.9963 |  0.8617   |  0.1057  |         1.5186         |
|            nfnet_l0             | 128 | 0.9892 |  0.8149   |  0.1029  |         1.4387         |
|      vit_base_patch16_224       | 64  | 0.9967 |  0.9937   |  0.1014  |         1.2357         |
| deit_base_distilled_patch16_224 | 64  | 0.9959 |   0.993   |  0.0994  |         1.2556         |
|           res2next50            | 128 | 0.9989 |  0.8241   |  0.0949  |         1.3631         |
|           volo_d1_224           | 64  | 0.9949 |  0.9723   |  0.0948  |         1.6673         |
|       gluon_inception_v3        | 128 | 0.9963 |  0.8634   |  0.0919  |         1.5168         |
|      xcit_large_24_p8_224       |  5  | 0.9913 |  0.8732   |  0.091   |         1.5812         |
|          convnext_base          | 64  | 0.9834 |  0.9849   |  0.0902  |         1.4711         |
|  swin_base_patch4_window7_224   | 64  | 0.9908 |  0.9419   |  0.087   |         1.606          |
|     swsl_resnext101_32x16d      | 32  | 0.9977 |   0.842   |  0.0845  |         1.0251         |
|        adv_inception_v3         | 128 | 0.9964 |  0.8578   |  0.0807  |         1.5189         |
|          cspdarknet53           | 64  | 0.9332 |  0.7844   |  0.0773  |         1.2624         |
|            repvgg_a2            | 128 | 0.9355 |  0.7538   |  0.0762  |         1.1189         |
|            gernet_l             | 128 | 0.9353 |  0.7902   |  0.076   |         1.067          |
|         poolformer_m36          | 64  | 0.987  |  0.9828   |  0.0759  |         1.3201         |
|           resnest101e           | 64  | 0.9941 |  0.8646   |  0.073   |         1.3512         |
|           selecsls42b           | 128 | 0.9984 |  0.8101   |  0.0729  |         1.4108         |
|       tf_efficientnet_b0        | 128 | 0.9599 |  0.6811   |  0.0722  |         1.3849         |
|            hrnet_w18            | 128 | 0.9924 |  0.6433   |  0.0709  |         1.3482         |
|          pnasnet5large          | 16  | 0.9855 |  0.9123   |  0.0699  |         1.1282         |
|          jx_nest_base           | 32  | 0.9873 |  0.9839   |  0.0673  |         1.3587         |
|         mobilenetv2_100         | 128 | 0.9489 |  0.7352   |  0.0661  |         1.4437         |
|        res2net50_14w_8s         | 128 | 0.9991 |  0.7879   |  0.0655  |         1.354          |
|         crossvit_9_240          | 128 |  0.99  |  0.7829   |  0.0654  |         1.6151         |
|          cait_m36_384           |  4  | 0.9952 |  0.9929   |  0.0648  |         1.3471         |
|          ghostnet_100           | 128 | 0.992  |  0.7624   |  0.0635  |         1.5898         |
|            fbnetv3_b            | 128 | 0.9488 |  0.7675   |  0.063   |         1.3128         |
|             dpn107              | 32  | 0.9335 |  0.8059   |  0.0622  |         1.1318         |
|          spnasnet_100           | 128 | 0.9414 |  0.7376   |  0.0614  |         1.4168         |
|           rexnet_100            | 128 | 0.9514 |  0.7019   |  0.0611  |         1.3319         |
|        gluon_xception65         | 32  | 0.9921 |  0.8414   |  0.0592  |         1.0782         |
|        twins_pcpvt_base         | 64  | 0.9934 |  0.9127   |  0.0574  |         1.6608         |
|      mobilenetv3_large_100      | 128 | 0.9498 |  0.7585   |  0.057   |         1.4225         |
|            tinynet_a            | 128 | 0.9466 |  0.6785   |  0.0546  |         1.2622         |
|        res2net101_26w_4s        | 64  | 1.0001 |  0.7949   |  0.0416  |         1.0892         |
|        ese_vovnet19b_dw         | 128 | 0.9563 |  0.8304   |  0.0407  |         1.3736         |
|           regnety_002           | 128 | 0.9552 |  0.7207   |  0.0397  |         1.2309         |
|            lcnet_050            | 128 | 0.9385 |  0.7334   |  0.0397  |         1.4654         |
|        sebotnet33ts_256         | 64  | 0.9561 |  0.7635   |  0.0392  |         1.5347         |
|           mobilevit_s           | 64  | 0.9623 |  0.7269   |  0.0343  |         1.4438         |
|           mnasnet_100           | 128 | 0.948  |  0.7385   |  0.0328  |         1.4958         |
|           fbnetc_100            | 128 | 0.9494 |   0.736   |  0.0282  |         1.4049         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|            hrnet_w18            | 128 |  9.778  |  36.8794  | 241.9436 |        240.276         |
|           rexnet_100            | 128 | 5.6774  |  11.3645  | 230.6732 |        295.3849        |
|          ghostnet_100           | 128 | 7.7426  |  15.2821  | 195.6391 |        240.9628        |
|          pnasnet5large          | 16  | 8.2728  |  26.3033  | 164.4357 |        160.6157        |
|           resnest101e           | 64  | 11.4054 |  24.7469  | 157.2108 |        167.0321        |
|            fbnetv3_b            | 128 | 8.5613  |  17.1987  | 151.9685 |        170.9544        |
|           mobilevit_s           | 64  | 5.2743  |  11.578   | 148.4481 |        160.2287        |
|        res2net101_26w_4s        | 64  | 10.986  |  25.1816  | 146.2659 |        149.7906        |
|        twins_pcpvt_base         | 64  | 10.7621 |  24.0906  | 144.9804 |        146.6821        |
|            mixnet_l             | 128 | 8.4631  |  16.6826  | 140.2277 |        163.528         |
|        adv_inception_v3         | 128 | 5.6881  |  12.6234  | 139.7505 |        155.502         |
|       gluon_inception_v3        | 128 | 5.7548  |  12.6504  | 139.1661 |        162.2305        |
|            tinynet_a            | 128 | 5.9394  |  12.6154  | 139.1365 |        161.3443        |
|          inception_v3           | 128 | 5.8326  |  12.8715  | 137.8222 |        155.6003        |
|           tf_mixnet_l           | 128 | 9.1651  |  17.1882  | 137.3263 |        162.6732        |
|      xcit_large_24_p8_224       |  5  | 12.8732 |  28.8029  | 136.0792 |        131.2448        |
|      mobilenetv3_large_100      | 128 | 4.2769  |  8.5468   | 135.2729 |        157.0403        |
|       tf_efficientnet_b0        | 128 | 5.1658  |  11.0922  | 131.8955 |        157.4531        |
|           fbnetc_100            | 128 | 5.1112  |  9.7133   | 124.2615 |        136.6052        |
|        res2net50_14w_8s         | 128 | 9.0837  |  22.8083  | 122.3688 |        124.6806        |
|          cait_m36_384           |  4  | 13.7426 |  31.4075  | 120.8796 |        112.9109        |
|  swin_base_patch4_window7_224   | 64  | 8.6566  |  19.6968  | 113.446  |        106.8387        |
|          spnasnet_100           | 128 | 5.0906  |  9.9572   | 112.4383 |        136.3509        |
|         mobilenetv2_100         | 128 | 4.0528  |  8.1324   | 107.5509 |        131.5354        |
|           mnasnet_100           | 128 | 4.0184  |  7.8738   | 105.3481 |        118.4082        |
|         poolformer_m36          | 64  | 7.7647  |  14.0466  | 102.5007 |        100.8431        |
|        sebotnet33ts_256         | 64  | 4.2905  |  9.1196   | 101.0567 |        109.6924        |
|             dpn107              | 32  | 9.8963  |   19.58   | 98.7509  |        99.0462         |
|        gluon_xception65         | 32  |  7.942  |  17.1298  |  95.217  |        94.3454         |
|             dla102              | 128 | 6.3734  |  14.3448  | 93.3804  |        96.9317         |
|           regnety_002           | 128 | 5.0155  |   9.133   | 91.5464  |        106.3091        |
|          cspdarknet53           | 64  | 5.7843  |  11.1364  | 89.7894  |        101.1072        |
|         coat_lite_mini          | 128 | 3.3652  |   7.998   | 88.3025  |        87.7946         |
|          jx_nest_base           | 32  | 6.7599  |  15.1302  |  88.023  |        83.7638         |
|       eca_botnext26ts_256       | 128 | 3.1822  |  7.0037   | 86.8381  |        98.5804         |
|         crossvit_9_240          | 128 | 5.8627  |  13.6993  | 85.3688  |        87.3635         |
|           res2next50            | 128 | 5.1791  |  12.3611  | 83.7251  |        86.3442         |
|          botnet26t_256          | 128 | 2.9482  |  6.1068   | 82.4957  |        90.5901         |
|            lcnet_050            | 128 | 2.5723  |  5.0863   | 80.9994  |        100.8512        |
|           selecsls42b           | 128 |  2.496  |  5.4498   | 77.4054  |         93.637         |
|           volo_d1_224           | 64  | 5.2665  |  12.0709  | 76.3067  |         73.771         |
|        tnt_s_patch16_224        | 128 | 6.5499  |  16.2679  | 73.6897  |         67.649         |
|            nfnet_l0             | 128 | 5.3479  |  11.1078  | 73.3934  |        76.6043         |
|        ese_vovnet19b_dw         | 128 | 2.6011  |  4.7861   | 72.0493  |        77.8406         |
|            gernet_l             | 128 | 5.0104  |  9.2358   | 71.9867  |        80.3044         |
|           dm_nfnet_f0           | 128 | 6.1428  |  11.5863  | 70.9679  |        72.5507         |
|     swsl_resnext101_32x16d      | 32  |  6.363  |  13.9736  | 66.0137  |         61.193         |
|         visformer_small         | 128 | 2.6382  |  6.2543   | 64.6626  |        68.3188         |
|          convnext_base          | 64  | 6.7463  |  12.7565  | 61.1451  |        58.4712         |
|          gmlp_s16_224           | 128 | 5.7205  |  12.303   | 60.2742  |        60.2171         |
|            repvgg_a2            | 128 | 4.9439  |  8.9596   | 58.7474  |        61.7609         |
|          gmixer_24_224          | 128 | 5.7734  |  13.1177  | 52.9991  |        50.0479         |
|           convit_base           | 64  | 3.5053  |  8.6834   | 51.3737  |         47.532         |
|            pit_b_224            | 64  | 3.5124  |  8.0904   | 47.6459  |        45.0403         |
| deit_base_distilled_patch16_224 | 64  | 3.2211  |  7.3516   |  44.879  |         41.591         |
|      vit_base_patch16_224       | 64  | 3.1089  |  7.1094   | 42.6139  |        38.6143         |
|        convmixer_768_32         | 32  | 1.7258  |  6.9779   | 42.0434  |        36.4788         |
|          resmlp_12_224          | 128 | 2.8471  |  5.5105   |  41.687  |        39.3996         |
|      beit_base_patch16_224      | 64  | 3.8393  |  8.8854   | 38.5899  |        35.2237         |
|          mixer_b16_224          | 128 | 2.7353  |  5.9956   |  34.334  |        32.3976         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.2872  |         1.2836         |
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.2057  |         1.2049         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  1.1899  |         1.1871         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1607  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.1583  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.1215  |         1.1179         |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  1.1129  |         1.1115         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  1.089   |         1.0876         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.0875  |         1.0845         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  1.0758  |         1.0721         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  1.0757  |         1.0728         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  1.0696  |         1.0675         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  1.0556  |         1.0539         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  1.0512  |         1.0506         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  1.0494  |         1.0457         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0377  |         1.0351         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  1.0361  |         1.0328         |
|          convnext_base          | 64  | 1.001  |   0.924   |  1.0345  |         1.0338         |
|             dla102              | 128 | 0.9634 |  0.9155   |  1.0323  |         1.0325         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  1.0251  |         1.0242         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  1.021   |         1.0202         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  1.0203  |         1.0194         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  1.0082  |         1.0072         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  1.0071  |         1.0057         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9976  |         0.9952         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9957  |         0.9948         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.9925  |          0.99          |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.9923  |         0.9902         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9917  |         0.9903         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.9912  |         0.9898         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9905  |         0.989          |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.9885  |         0.989          |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9864  |         0.9854         |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9821  |         0.9793         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.9793  |         0.9786         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.9793  |         0.977          |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.979   |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.9776  |         0.9732         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.9738  |         0.9706         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9732  |         0.9727         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.9714  |         0.9705         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.9702  |         0.9664         |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.966   |         0.9611         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.9646  |         0.9642         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.9637  |         0.9607         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.9611  |         0.9604         |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.9582  |         0.9535         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.9568  |         0.9547         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9562  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9537  |         0.9528         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.9509  |         0.9483         |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.9497  |         0.9451         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.9448  |         0.9403         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.9376  |         0.9361         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.9046  |         0.9045         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.901   |         0.8966         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.8898  |         0.884          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+-----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor  | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+-----------+------------------------+
|            hrnet_w18            | 128 | 281.6199 | 434.5523  | 3991.9665 |        207.6994        |
|          pnasnet5large          | 16  | 200.0396 | 214.4907  | 2829.9292 |        174.1102        |
|           fbnetc_100            | 128 | 83.0941  | 107.3892  | 2814.9486 |        56.0651         |
|          cait_m36_384           |  4  | 168.4731 | 168.5989  | 2594.2672 |        124.0264        |
|        res2net101_26w_4s        | 64  | 102.8179 | 125.2166  | 2448.1599 |        90.1924         |
|           mobilevit_s           | 64  | 84.7508  | 112.3908  | 2390.3777 |        56.5629         |
|           resnest101e           | 64  | 165.4361 | 189.7863  | 2264.8382 |        121.8647        |
|        res2net50_14w_8s         | 128 | 141.1947 |  178.993  | 2159.8443 |        104.301         |
|        twins_pcpvt_base         | 64  | 123.5675 | 130.7979  | 2093.2332 |        70.6266         |
|        adv_inception_v3         | 128 | 160.7066 |  186.938  | 1995.0247 |        105.6879        |
|        sebotnet33ts_256         | 64  | 80.8781  | 101.1623  | 1973.784  |        50.2464         |
|         poolformer_m36          | 64  | 147.3701 | 147.8645  | 1919.1283 |        109.9384        |
|           mnasnet_100           | 128 | 64.4287  |  82.8585  | 1872.4471 |        40.8617         |
|       gluon_inception_v3        | 128 | 161.4635 |  185.698  | 1749.4988 |        106.057         |
|            fbnetv3_b            | 128 | 115.9266 | 142.7094  | 1745.4915 |        83.5255         |
|             dpn107              | 32  | 114.1957 | 131.6036  | 1717.069  |        93.9622         |
|  swin_base_patch4_window7_224   | 64  | 148.1235 |  156.014  | 1691.532  |        91.2785         |
|        gluon_xception65         | 32  | 100.1655 | 117.9339  | 1686.3976 |        91.8619         |
|           tf_mixnet_l           | 128 | 194.4644 | 230.1755  | 1533.0187 |        159.2038        |
|        ese_vovnet19b_dw         | 128 | 64.8895  |  74.9183  | 1531.1184 |        45.1569         |
|          inception_v3           | 128 | 160.993  | 186.2488  | 1521.0157 |        105.9165        |
|             dla102              | 128 | 173.0473 | 211.3805  | 1517.4622 |        113.3462        |
|        tnt_s_patch16_224        | 128 | 324.6905 | 325.1492  | 1507.7901 |        108.7784        |
|            mixnet_l             | 128 | 185.5692 |  221.306  | 1500.0963 |        153.2433        |
|          jx_nest_base           | 32  | 101.9375 | 102.4069  | 1495.5461 |        74.0256         |
|          ghostnet_100           | 128 | 90.9649  | 118.3655  | 1423.8817 |        56.7472         |
|      xcit_large_24_p8_224       |  5  | 129.4915 |  148.111  | 1411.3003 |        77.4535         |
|     swsl_resnext101_32x16d      | 32  | 119.1915 | 140.9323  | 1406.3667 |        115.7161        |
|          convnext_base          | 64  | 124.5532 | 124.0437  | 1359.9525 |        83.5299         |
|           res2next50            | 128 | 126.2004 |  153.201  | 1332.2532 |        92.4661         |
|            tinynet_a            | 128 | 73.7914  | 103.2091  | 1287.7094 |        55.3178         |
|           volo_d1_224           | 64  | 121.3163 | 124.1599  | 1278.5455 |        72.4227         |
|         crossvit_9_240          | 128 | 82.8674  | 104.7031  | 1258.1031 |        50.7345         |
|           rexnet_100            | 128 | 80.1568  | 108.9091  | 1256.5314 |        57.4981         |
|        convmixer_768_32         | 32  | 301.2019 | 312.0741  | 1209.316  |        300.2526        |
|           dm_nfnet_f0           | 128 | 128.9556 | 128.7078  | 1181.6971 |        88.8811         |
|          cspdarknet53           | 64  | 95.2158  |  113.166  | 1155.2039 |        70.3726         |
|       tf_efficientnet_b0        | 128 | 85.0295  | 120.0092  | 1135.3292 |        59.0075         |
|            nfnet_l0             | 128 | 112.9506 | 137.4618  | 1089.118  |        78.1865         |
|          spnasnet_100           | 128 | 70.6001  |  89.9914  | 1086.9377 |        46.9857         |
|      mobilenetv3_large_100      | 128 | 61.5413  |  77.0144  | 1028.4043 |        41.0301         |
|           regnety_002           | 128 | 40.7759  |  57.4313  | 1012.4401 |        30.6039         |
|         coat_lite_mini          | 128 | 113.1545 | 113.2858  | 1002.9557 |        58.8253         |
|           convit_base           | 64  | 163.3935 | 163.4786  | 967.2417  |        101.3648        |
|            gernet_l             | 128 | 77.8066  |  92.3275  | 961.7029  |         68.273         |
|            repvgg_a2            | 128 | 77.7306  |  96.6897  | 957.7126  |        64.9953         |
|         mobilenetv2_100         | 128 | 65.7873  |  84.9297  | 947.1734  |        43.1954         |
|      beit_base_patch16_224      | 64  | 101.547  | 106.9385  | 920.9147  |        75.0658         |
|       eca_botnext26ts_256       | 128 | 109.032  | 147.8562  | 867.2073  |        74.6056         |
|      vit_base_patch16_224       | 64  | 87.0016  |  87.3927  | 858.1064  |        70.1318         |
| deit_base_distilled_patch16_224 | 64  | 85.0241  |  85.4888  | 854.8096  |        67.4632         |
|         visformer_small         | 128 | 91.4264  |  96.6112  | 843.2981  |         78.075         |
|          botnet26t_256          | 128 | 101.961  |  117.042  | 833.3117  |         70.073         |
|           selecsls42b           | 128 | 60.0905  |  74.1637  |  828.072  |        42.6088         |
|            lcnet_050            | 128 | 31.8822  |  40.8688  | 759.1991  |        20.4309         |
|          gmlp_s16_224           | 128 | 138.0935 | 127.0163  | 649.4224  |        75.1999         |
|          gmixer_24_224          | 128 | 118.487  | 132.3165  | 624.3358  |        67.2767         |
|            pit_b_224            | 64  | 118.8154 | 119.3408  |  513.715  |        82.8237         |
|          mixer_b16_224          | 128 | 116.6629 | 114.4926  | 430.7481  |        85.5997         |
|          resmlp_12_224          | 128 | 53.6419  |  59.8597  | 390.2384  |        42.2773         |
+---------------------------------+-----+----------+-----------+-----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

Build Summary

see more

Run name

day_089_30_03_23_performance_amp_667

Commit hashes

pytorch commit: dc2b7aa
pytorch commit date: 2023-03-31 02:01:52+00:00
torchbench commit: c0fc2e48ba2eecda24b853238d37f82bb15aee8b
torchbench commit date: 2023-03-30 17:19:07-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitdc2b7aa

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 82%, 49/60 | 84%, 38/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 88%, 53/60 | 98%, 44/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.00x    |    1.53x    |    1.00x    |
| inductor_no_cudagraphs |   1.28x    |    1.50x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.75    |    7.66     |    5.92     |
|       aot_eager        |    9.34    |    16.10    |    13.28    |
|        inductor        |   60.67    |    64.46    |   101.74    |
| inductor_no_cudagraphs |   64.06    |    61.56    |   109.68    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.98x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.94x    |    0.99x    |    1.02x    |
| inductor_no_cudagraphs |   0.94x    |    1.04x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Previous report name: /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 82%, 49/60  | 82%, 49/60  |
|        inductor        | huggingface | 84%, 38/45  | 84%, 38/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 88%, 53/60  | 88%, 53/60  |
| inductor_no_cudagraphs | huggingface | 98%, 44/45  | 98%, 44/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.00x    |   1.00x   |
|        inductor        | huggingface |   1.40x    |   1.53x   |
|        inductor        | timm_models |   1.00x    |   1.00x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.28x   |
| inductor_no_cudagraphs | huggingface |   1.48x    |   1.50x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+----------------------------+------------------------+-----------------+
|    suite    |            name            | inductor_no_cudagraphs |    inductor     |
+-------------+----------------------------+------------------------+-----------------+
| torchbench  |            moco            |      fail_to_run       |   fail_to_run   |
| torchbench  |     Background_Matting     |    eager_variation     | eager_variation |
| torchbench  |      vision_maskrcnn       |    eager_variation     | eager_variation |
| torchbench  |         tacotron2          |         0.0000         |     0.0000      |
| torchbench  |            gat             |         0.0000         |     0.0000      |
| torchbench  |            gcn             |         0.0000         |     0.0000      |
| torchbench  |           llama            |         0.0000         |     0.0000      |
| torchbench  |            sage            |         0.0000         |     0.0000      |
| torchbench  |       torchrec_dlrm        |         0.0000         |     0.0000      |
| huggingface | AlbertForQuestionAnswering |     fail_accuracy      |  fail_accuracy  |
+-------------+----------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+-----------------------------------+------------------------+----------+
|    suite    |               name                | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------+------------------------+----------+
| torchbench  |               hf_T5               |         1.9842         |  0.1794  |
| torchbench  |             hf_Albert             |         2.2909         |  0.1658  |
| torchbench  |               vgg16               |         1.2546         |  0.1498  |
| torchbench  |        Background_Matting         |         1.2074         |  0.1232  |
| torchbench  |           hf_Bert_large           |         1.5608         |  0.1076  |
| torchbench  |            timm_nfnet             |         1.4741         |  0.1022  |
| torchbench  |           hf_GPT2_large           |         1.7364         |  0.0963  |
| torchbench  |              hf_Bert              |         1.6054         |  0.0944  |
| torchbench  |            hf_T5_large            |         1.8652         |  0.0688  |
| torchbench  |           BERT_pytorch            |         2.0587         |  0.0644  |
| torchbench  |              yolov3               |         1.1994         |  0.057   |
| torchbench  |              hf_GPT2              |         1.8105         |  0.0546  |
| torchbench  | attention_is_all_you_need_pytorch |         1.5601         |  0.0542  |
| torchbench  |           mobilenet_v2            |         1.5244         |  0.0541  |
| torchbench  |           pytorch_unet            |         1.3514         |  0.0524  |
| torchbench  |            timm_regnet            |         0.9686         |  0.047   |
| torchbench  |           hf_DistilBert           |         1.4754         |  0.0462  |
| torchbench  |              hf_Bart              |         1.5738         |  0.0424  |
| torchbench  |              demucs               |         1.0399         |  0.0406  |
| torchbench  |        shufflenet_v2_x1_0         |         1.1956         |  0.0364  |
| torchbench  |      timm_vision_transformer      |         1.3756         |  0.0363  |
| torchbench  |            densenet121            |         1.0656         |  0.035   |
| torchbench  |           timm_resnest            |         1.519          |  0.0349  |
| torchbench  |             resnet152             |         1.0218         |  0.034   |
| torchbench  |             resnet50              |         1.0469         |  0.0331  |
| torchbench  |        mobilenet_v3_large         |         1.167          |  0.0319  |
| torchbench  |          pytorch_stargan          |         1.3091         |  0.0318  |
| torchbench  |            timm_vovnet            |         0.9243         |  0.0312  |
| torchbench  |         timm_efficientnet         |         1.0849         |  0.0311  |
| torchbench  |         phlippe_densenet          |         1.0014         |  0.0296  |
| torchbench  |          resnext50_32x4d          |         0.991          |  0.029   |
| torchbench  |            mnasnet1_0             |         1.0379         |  0.0287  |
| torchbench  |   pytorch_CycleGAN_and_pix2pix    |         1.7534         |  0.0277  |
| torchbench  |      nvidia_deeprecommender       |         1.0182         |  0.0268  |
| torchbench  |              alexnet              |         1.1314         |  0.0243  |
| torchbench  |           squeezenet1_1           |         1.317          |  0.024   |
| torchbench  |        speech_transformer         |         1.6245         |  0.0231  |
| torchbench  |          pytorch_struct           |         1.1495         |  0.0231  |
| torchbench  |            tts_angular            |         0.9604         |  0.0213  |
| torchbench  |       functorch_dp_cifar10        |         1.352          |  0.0211  |
| torchbench  |          phlippe_resnet           |         1.011          |  0.0205  |
| torchbench  |             resnet18              |         0.9533         |  0.0193  |
| torchbench  |           fastNLP_Bert            |         1.5174         |  0.0173  |
| torchbench  |          LearningToPaint          |         1.0523         |  0.0173  |
| torchbench  |            hf_Reformer            |         1.0648         |  0.0077  |
| torchbench  |               dcgan               |         0.8199         |  0.0075  |
| torchbench  |           lennard_jones           |         0.9151         |  0.0071  |
| torchbench  |                drq                |         0.9853         |  0.0038  |
| torchbench  |         soft_actor_critic         |         0.8306         |  0.0029  |
| torchbench  |               sage                |          0.0           |   0.0    |
| torchbench  |             tacotron2             |          0.0           |   0.0    |
| torchbench  |                gcn                |          0.0           |   0.0    |
| torchbench  |                gat                |          0.0           |   0.0    |
| torchbench  |               dlrm                |         1.1651         |   0.0    |
| torchbench  |               moco                |          0.0           |   0.0    |
| torchbench  |   timm_vision_transformer_large   |         1.0815         |   0.0    |
| torchbench  |           hf_Longformer           |         1.3148         |   0.0    |
| torchbench  |            hf_BigBird             |         1.6523         |   0.0    |
| torchbench  |           torchrec_dlrm           |          0.0           |   0.0    |
| huggingface |         YituTechConvBert          |         1.4893         |  0.0282  |
| huggingface |        ElectraForCausalLM         |         1.816          |  0.0261  |
| huggingface |      Speech2Text2ForCausalLM      |         1.5431         |  0.0231  |
| huggingface |  PegasusForConditionalGeneration  |         1.2914         |  0.0219  |
| huggingface |        PegasusForCausalLM         |         1.2512         |  0.0207  |
| huggingface |          XGLMForCausalLM          |         1.4501         |  0.0203  |
| huggingface |  M2M100ForConditionalGeneration   |         1.5167         |  0.0167  |
| huggingface |       BlenderbotForCausalLM       |         1.2843         |   0.0    |
| huggingface |       AllenaiLongformerBase       |         1.558          |   0.0    |
| huggingface |    DebertaForQuestionAnswering    |         0.9464         |   0.0    |
| huggingface |        DebertaForMaskedLM         |         0.8029         |   0.0    |
| huggingface |   DebertaV2ForQuestionAnswering   |         0.6628         |   0.0    |
| huggingface |       DebertaV2ForMaskedLM        |         0.6568         |   0.0    |
| timm_models |         convmixer_768_32          |         1.0028         |  0.3307  |
| timm_models |           mixer_b16_224           |         1.3593         |  0.2751  |
| timm_models |             pit_b_224             |         1.4279         |  0.2327  |
| timm_models |         tnt_s_patch16_224         |         2.9733         |  0.2104  |
| timm_models |           gmlp_s16_224            |         1.8317         |  0.2068  |
| timm_models |           gmixer_24_224           |         1.7476         |  0.1891  |
| timm_models |            convit_base            |         1.6106         |  0.1675  |
| timm_models |           resmlp_12_224           |         1.2564         |  0.1376  |
| timm_models |            tf_mixnet_l            |         1.1891         |  0.1211  |
| timm_models |        eca_botnext26ts_256        |         1.4228         |  0.1209  |
| timm_models |             mixnet_l              |         1.1813         |  0.1207  |
| timm_models |           botnet26t_256           |         1.4226         |  0.1175  |
| timm_models |          coat_lite_mini           |         1.917          |  0.1165  |
| timm_models |              dla102               |         1.5201         |  0.1131  |
| timm_models |       beit_base_patch16_224       |         1.351          |  0.1101  |
| timm_models |          visformer_small          |         1.1657         |  0.1066  |
| timm_models |           inception_v3            |         1.5181         |  0.1064  |
| timm_models |        gluon_inception_v3         |         1.5196         |  0.1064  |
| timm_models |         adv_inception_v3          |         1.5186         |  0.1059  |
| timm_models |            dm_nfnet_f0            |         1.4291         |  0.1053  |
| timm_models |       vit_base_patch16_224        |         1.2344         |  0.1032  |
| timm_models |             nfnet_l0              |         1.436          |  0.1011  |
| timm_models |  deit_base_distilled_patch16_224  |         1.2547         |  0.0991  |
| timm_models |            res2next50             |         1.3622         |  0.0944  |
| timm_models |            volo_d1_224            |         1.6654         |  0.0943  |
| timm_models |       xcit_large_24_p8_224        |         1.5374         |  0.0911  |
| timm_models |           convnext_base           |         1.4707         |  0.0876  |
| timm_models |   swin_base_patch4_window7_224    |         1.605          |  0.0852  |
| timm_models |      swsl_resnext101_32x16d       |         1.0239         |  0.0839  |
| timm_models |           cspdarknet53            |         1.2589         |  0.0774  |
| timm_models |          poolformer_m36           |         1.3186         |  0.0773  |
| timm_models |             repvgg_a2             |         1.1171         |  0.0768  |
| timm_models |             gernet_l              |         1.065          |  0.0755  |
| timm_models |            selecsls42b            |         1.4119         |  0.0727  |
| timm_models |            resnest101e            |         1.3533         |  0.0726  |
| timm_models |             hrnet_w18             |          1.35          |  0.0718  |
| timm_models |        tf_efficientnet_b0         |         1.3847         |  0.0714  |
| timm_models |           pnasnet5large           |         1.1265         |  0.0698  |
| timm_models |           jx_nest_base            |         1.3587         |  0.0685  |
| timm_models |          crossvit_9_240           |         1.6117         |  0.0676  |
| timm_models |          mobilenetv2_100          |         1.4445         |  0.0653  |
| timm_models |             fbnetv3_b             |         1.3239         |  0.0649  |
| timm_models |         res2net50_14w_8s          |         1.3572         |  0.0647  |
| timm_models |           cait_m36_384            |         1.3457         |  0.0647  |
| timm_models |           ghostnet_100            |         1.6198         |  0.0624  |
| timm_models |              dpn107               |         1.1347         |  0.0617  |
| timm_models |            rexnet_100             |         1.3297         |  0.0612  |
| timm_models |           spnasnet_100            |         1.4164         |  0.061   |
| timm_models |         gluon_xception65          |         1.0788         |  0.0593  |
| timm_models |         twins_pcpvt_base          |         1.6714         |  0.0586  |
| timm_models |       mobilenetv3_large_100       |         1.432          |  0.0564  |
| timm_models |             tinynet_a             |         1.2456         |  0.054   |
| timm_models |         res2net101_26w_4s         |         1.0693         |  0.0421  |
| timm_models |            regnety_002            |         1.2167         |  0.0397  |
| timm_models |         sebotnet33ts_256          |         1.5321         |  0.0395  |
| timm_models |             lcnet_050             |         1.4066         |  0.0391  |
| timm_models |            fbnetc_100             |         1.402          |  0.0368  |
| timm_models |            mobilevit_s            |         1.4408         |  0.0349  |
| timm_models |         ese_vovnet19b_dw          |         1.3707         |  0.0327  |
| timm_models |            mnasnet_100            |         1.4948         |  0.0326  |
+-------------+-----------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |          hf_T5_large           |        172.9013        | 184.5866 |
| torchbench  |        phlippe_densenet        |        163.685         | 136.8719 |
| torchbench  |          densenet121           |        138.4565        | 129.4779 |
| torchbench  |       timm_efficientnet        |        144.8415        | 121.8595 |
| torchbench  |       mobilenet_v3_large       |        127.6172        | 117.5636 |
| torchbench  |          mobilenet_v2          |        131.8938        | 102.9741 |
| torchbench  |           hf_BigBird           |        128.1495        |   nan    |
| torchbench  | timm_vision_transformer_large  |        123.929         |   nan    |
| torchbench  |         hf_Longformer          |        120.7309        |   nan    |
| huggingface | M2M100ForConditionalGeneration |        137.2563        | 171.4798 |
| huggingface |        XGLMForCausalLM         |        135.8424        | 155.6345 |
| huggingface |     MobileBertForMaskedLM      |        144.5766        | 151.2448 |
| huggingface | MobileBertForQuestionAnswering |        140.7126        |  143.19  |
| huggingface |  MT5ForConditionalGeneration   |        132.4586        | 127.3442 |
| timm_models |           hrnet_w18            |        243.3731        | 239.6495 |
| timm_models |           rexnet_100           |        292.0584        | 230.5481 |
| timm_models |          ghostnet_100          |        235.9697        | 199.8083 |
| timm_models |         pnasnet5large          |        162.9962        | 161.2678 |
| timm_models |          resnest101e           |        163.1424        | 157.5606 |
| timm_models |           fbnetv3_b            |        174.4576        | 152.766  |
| timm_models |          mobilevit_s           |        161.0668        | 149.3897 |
| timm_models |       res2net101_26w_4s        |        151.1697        | 144.0901 |
| timm_models |        twins_pcpvt_base        |        147.9188        | 143.9423 |
| timm_models |            mixnet_l            |        158.0146        | 139.288  |
| timm_models |          tf_mixnet_l           |        161.359         | 138.3913 |
| timm_models |        adv_inception_v3        |        159.314         | 137.0842 |
| timm_models |           tinynet_a            |        161.1748        | 137.0509 |
| timm_models |          inception_v3          |        156.5059        | 136.8852 |
| timm_models |      xcit_large_24_p8_224      |        132.8375        | 136.7425 |
| timm_models |       gluon_inception_v3       |        158.8665        | 136.6642 |
| timm_models |       tf_efficientnet_b0       |        155.4947        | 130.7149 |
| timm_models |     mobilenetv3_large_100      |        162.5033        | 130.4923 |
| timm_models |          cait_m36_384          |        116.2197        | 121.6808 |
| timm_models |        res2net50_14w_8s        |        124.6533        | 120.6844 |
| timm_models |           fbnetc_100           |        137.5629        | 120.222  |
| timm_models |          spnasnet_100          |        137.9566        | 119.2263 |
| timm_models |        mobilenetv2_100         |        128.9254        | 108.503  |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.8951  |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.8934  |
| torchbench  |                resnet50                 |         0.8838         |  0.889   |
| torchbench  |               timm_vovnet               |         0.8869         |  0.889   |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8873  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8834  |
| torchbench  |           speech_transformer            |         0.869          |  0.8694  |
| torchbench  |               densenet121               |         0.8034         |  0.8228  |
| torchbench  |           mobilenet_v3_large            |         0.8726         |  0.8159  |
| torchbench  |               mnasnet1_0                |         0.8074         |  0.8149  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.8064  |
| torchbench  |             resnext50_32x4d             |         0.772          |  0.7797  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.7552  |
| torchbench  |             pytorch_struct              |         0.7362         |  0.7428  |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6004         |   0.62   |
| torchbench  |                resnet18                 |         0.6097         |  0.619   |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.451   |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3554  |
| huggingface |           ElectraForCausalLM            |         0.8941         |  0.8953  |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8872  |
| huggingface |            TrOCRForCausalLM             |         0.9583         |  0.8855  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8749  |
| huggingface |       BlenderbotSmallForCausalLM        |         0.9119         |  0.8215  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.8112  |
| huggingface |         Speech2Text2ForCausalLM         |         0.8095         |  0.8111  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6659  |
| huggingface |          AllenaiLongformerBase          |         0.8742         |   nan    |
| timm_models |               regnety_002               |         0.8966         |  0.901   |
| timm_models |                lcnet_050                |         0.884          |  0.8898  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/comp_time_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/passrate_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Compilation latency (sec) regressions

+------------------------+---------------+-------------+------------+
|        compiler        |     name      | prev_status | cur_status |
+------------------------+---------------+-------------+------------+
| inductor_no_cudagraphs | hf_Longformer |  118.5972   |  120.7309  |
+------------------------+---------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Compilation latency (sec) regressions

+------------------------+--------------------------------+-------------+------------+
|        compiler        |              name              | prev_status | cur_status |
+------------------------+--------------------------------+-------------+------------+
| inductor_no_cudagraphs | M2M100ForConditionalGeneration |   106.222   |  137.2563  |
| inductor_no_cudagraphs |        XGLMForCausalLM         |   71.2363   |  135.8424  |
+------------------------+--------------------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+------------------+-------------+------------+
| compiler |       name       | prev_status | cur_status |
+----------+------------------+-------------+------------+
| inductor | TrOCRForCausalLM |   0.9085    |   0.8855   |
+----------+------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 0.9974 |  0.1792   |  1.2329  |         1.2334         |
|               hf_T5               |  8   | 0.9849 |  0.8498   |  0.1794  |         1.9842         |
|             hf_Albert             |  8   | 0.993  |  0.9546   |  0.1658  |         2.2909         |
|               vgg16               |  64  | 0.9995 |  0.9985   |  0.1498  |         1.2546         |
|        Background_Matting         |  4   | 0.999  |  0.1368   |  0.1232  |         1.2074         |
|           hf_Bert_large           |  4   | 0.9972 |  0.8731   |  0.1076  |         1.5608         |
|            timm_nfnet             | 128  | 0.9859 |  0.9841   |  0.1022  |         1.4741         |
|           hf_GPT2_large           |  4   | 0.9828 |  0.9709   |  0.0963  |         1.7364         |
|              hf_Bert              |  4   | 0.9918 |  0.8481   |  0.0944  |         1.6054         |
|            hf_T5_large            |  2   | 0.976  |  0.8091   |  0.0688  |         1.8652         |
|           BERT_pytorch            |  16  | 0.986  |  0.8079   |  0.0644  |         2.0587         |
|              yolov3               |  16  | 0.9962 |   0.807   |  0.057   |         1.1994         |
|              hf_GPT2              |  4   | 0.9907 |  0.9625   |  0.0546  |         1.8105         |
| attention_is_all_you_need_pytorch | 256  | 0.9895 |   0.911   |  0.0542  |         1.5601         |
|           mobilenet_v2            |  96  | 0.9972 |  0.7782   |  0.0541  |         1.5244         |
|           pytorch_unet            |  1   | 0.9966 |   0.205   |  0.0524  |         1.3514         |
|            timm_regnet            |  32  | 0.9186 |  0.7791   |  0.047   |         0.9686         |
|           hf_DistilBert           |  8   | 0.9838 |  0.9531   |  0.0462  |         1.4754         |
|              hf_Bart              |  4   | 0.9848 |  0.8403   |  0.0424  |         1.5738         |
|              demucs               |  4   | 0.9983 |   1.003   |  0.0406  |         1.0399         |
|        shufflenet_v2_x1_0         | 128  | 0.9949 |  0.7608   |  0.0364  |         1.1956         |
|      timm_vision_transformer      |  32  | 0.9787 |  0.8618   |  0.0363  |         1.3756         |
|            densenet121            |  4   | 0.9842 |  0.7201   |  0.035   |         1.0656         |
|           timm_resnest            |  32  | 0.9921 |   0.853   |  0.0349  |         1.519          |
|             resnet152             |  32  | 0.9941 |  0.7622   |  0.034   |         1.0218         |
|             resnet50              |  32  | 0.9936 |  0.7756   |  0.0331  |         1.0469         |
|        mobilenet_v3_large         |  32  | 0.9938 |   0.787   |  0.0319  |         1.167          |
|          pytorch_stargan          |  16  | 0.9937 |  0.8084   |  0.0318  |         1.3091         |
|            timm_vovnet            |  32  | 0.8521 |  0.7026   |  0.0312  |         0.9243         |
|         timm_efficientnet         |  32  | 0.9368 |  0.6278   |  0.0311  |         1.0849         |
|         phlippe_densenet          | 128  | 0.9866 |  0.7779   |  0.0296  |         1.0014         |
|          resnext50_32x4d          |  8   | 0.9821 |  0.7236   |  0.029   |         0.991          |
|            mnasnet1_0             |  32  | 0.9863 |  0.7358   |  0.0287  |         1.0379         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9666 |   0.897   |  0.0277  |         1.7534         |
|      nvidia_deeprecommender       | 256  | 0.9987 |  0.9982   |  0.0268  |         1.0182         |
|              alexnet              | 128  | 0.9986 |   0.997   |  0.0243  |         1.1314         |
|           squeezenet1_1           |  32  | 0.9811 |  0.9437   |  0.024   |         1.317          |
|        speech_transformer         |  32  | 0.9824 |  0.7975   |  0.0231  |         1.6245         |
|          pytorch_struct           | 200  | 0.9132 |  0.7632   |  0.0231  |         1.1495         |
|            tts_angular            |  64  | 0.9211 |  0.8755   |  0.0213  |         0.9604         |
|       functorch_dp_cifar10        |  64  | 0.9586 |  0.9148   |  0.0211  |         1.352          |
|          phlippe_resnet           | 128  | 0.9854 |   0.762   |  0.0205  |         1.011          |
|             resnet18              |  16  | 0.9845 |  0.7653   |  0.0193  |         0.9533         |
|           fastNLP_Bert            |  6   | 0.9932 |  0.8583   |  0.0173  |         1.5174         |
|          LearningToPaint          |  96  | 0.9893 |  0.7763   |  0.0173  |         1.0523         |
|            hf_Reformer            |  4   | 0.9854 |  0.9646   |  0.0077  |         1.0648         |
|               dcgan               |  32  | 0.8617 |  0.6946   |  0.0075  |         0.8199         |
|           lennard_jones           | 1000 | 0.8156 |  0.7376   |  0.0071  |         0.9151         |
|                drq                |  1   | 0.9518 |  0.7465   |  0.0038  |         0.9853         |
|         soft_actor_critic         | 256  | 0.8564 |   0.616   |  0.0029  |         0.8306         |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9431 |  0.8395   |   0.0    |         1.1651         |
|               moco                |  32  | 0.9753 |    0.0    |   0.0    |          0.0           |
|   timm_vision_transformer_large   |  32  | 0.998  |    0.0    |   0.0    |         1.0815         |
|           hf_Longformer           |  2   | 0.8254 |  0.5705   |   0.0    |         1.3148         |
|            hf_BigBird             |  2   | 0.9479 |  0.7737   |   0.0    |         1.6523         |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 26.8845 |  56.1931  | 184.5866 |        172.9013        |
|         phlippe_densenet          | 128  |  3.26   |  7.0948   | 136.8719 |        163.685         |
|            densenet121            |  4   |  7.702  |  18.3769  | 129.4779 |        138.4565        |
|         timm_efficientnet         |  32  | 4.9945  |  10.186   | 121.8595 |        144.8415        |
|        mobilenet_v3_large         |  32  | 3.4298  |  7.6504   | 117.5636 |        127.6172        |
|           hf_GPT2_large           |  4   | 15.3025 |  30.0292  | 113.948  |        107.9377        |
|             resnet152             |  32  | 9.1285  |  20.2944  | 105.5312 |        106.7713        |
|              yolov3               |  16  | 4.9949  |  10.8024  | 105.4902 |        119.8703        |
|           mobilenet_v2            |  96  | 3.1396  |  6.9604   | 102.9741 |        131.8938        |
|        speech_transformer         |  32  | 6.0415  |  13.9548  | 93.4574  |         78.544         |
|            mnasnet1_0             |  32  | 3.1198  |  6.6841   | 93.1225  |        105.2327        |
|            hf_Reformer            |  4   | 4.2314  |  6.0198   | 89.7849  |        41.6074         |
|           timm_resnest            |  32  | 1.8353  |  4.0143   | 83.0955  |        101.3259        |
| attention_is_all_you_need_pytorch | 256  | 4.4653  |  11.0933  | 78.5163  |        75.1667         |
|        shufflenet_v2_x1_0         | 128  | 3.4187  |  7.7108   | 72.2917  |        79.9618         |
|            timm_regnet            |  32  | 6.7285  |  12.6698  | 71.6498  |        69.8476         |
|           BERT_pytorch            |  16  | 4.9464  |  11.6694  | 71.3905  |        69.2684         |
|            timm_nfnet             | 128  | 5.8657  |  11.1956  | 70.1569  |        72.4187         |
|           hf_Bert_large           |  4   | 10.3237 |  21.1812  | 68.3372  |        64.9419         |
|        Background_Matting         |  4   | 3.1767  |  11.4665  | 64.4282  |        68.0687         |
|           fastNLP_Bert            |  6   | 5.1917  |  11.3105  | 63.1567  |        50.7496         |
|             resnet50              |  32  | 3.2132  |  7.0563   |  59.69   |        64.9693         |
|            timm_vovnet            |  32  |  3.639  |  6.5399   | 57.6247  |        63.6236         |
|              hf_Bart              |  4   | 6.1617  |  13.6247  | 55.7028  |         50.102         |
|           pytorch_unet            |  1   |  1.539  |  4.4294   | 53.9831  |        59.1821         |
|               hf_T5               |  8   | 5.7047  |  12.8847  | 52.5958  |        50.8917         |
|          resnext50_32x4d          |  8   |  3.211  |  7.0031   | 51.6533  |        53.2884         |
|      timm_vision_transformer      |  32  | 3.3796  |  7.3336   | 50.9971  |        50.6617         |
|       functorch_dp_cifar10        |  64  | 1.2472  |  2.3999   | 45.7323  |        57.1101         |
|              hf_GPT2              |  4   | 4.6495  |  9.6682   | 44.4224  |        41.6876         |
|            Super_SloMo            |  6   | 2.8017  |  9.8097   | 43.1014  |        42.1289         |
|          pytorch_stargan          |  16  | 1.2368  |  3.2147   | 42.8149  |        42.6709         |
|              hf_Bert              |  4   | 5.0621  |  10.7803  | 41.5207  |        39.4468         |
|             hf_Albert             |  8   | 2.5519  |  8.0691   | 40.9184  |        39.0721         |
|          LearningToPaint          |  96  | 1.4323  |   2.906   | 40.5855  |        44.7201         |
|             resnet18              |  16  | 1.3527  |  2.8477   | 39.4374  |        43.4302         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2264  |  2.9776   |  35.761  |        37.4428         |
|              demucs               |  4   | 1.4322  |  2.2042   | 34.0157  |        29.7795         |
|           hf_DistilBert           |  8   | 2.4558  |  5.2465   | 32.6267  |        30.1363         |
|          phlippe_resnet           | 128  | 1.3408  |  2.8504   | 30.7331  |        32.1313         |
|           squeezenet1_1           |  32  | 1.0661  |  1.7589   | 24.2017  |        23.8826         |
|          pytorch_struct           | 200  | 0.7479  |  1.3231   | 22.0593  |        19.6552         |
|               vgg16               |  64  | 0.6429  |  1.1459   | 16.9538  |        14.9791         |
|                drq                |  1   | 0.6878  |  1.0109   | 16.5778  |        10.9201         |
|              alexnet              | 128  | 0.4912  |  0.7782   | 16.4275  |         14.416         |
|         soft_actor_critic         | 256  |  0.438  |   0.594   | 12.5252  |         6.7479         |
|      nvidia_deeprecommender       | 256  |  0.478  |  0.7673   | 12.0408  |         9.5109         |
|               dcgan               |  32  | 0.4338  |  0.7126   |  9.6429  |         7.5447         |
|           lennard_jones           | 1000 | 0.3962  |  0.6067   |  7.8242  |         6.2072         |
|            tts_angular            |  64  | 0.4489  |  0.5146   |  7.2158  |         5.7375         |
|            hf_BigBird             |  2   | 12.9407 |  37.486   |   nan    |        128.1495        |
|   timm_vision_transformer_large   |  32  | 9.5203  |    nan    |   nan    |        123.929         |
|           hf_Longformer           |  2   | 11.4924 |  31.3248  |   nan    |        120.7309        |
|               dlrm                | 1024 | 0.3721  |  0.7764   |   nan    |         7.3036         |
|               moco                |  32  | 27.7453 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.2585  |         1.2557         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.2078  |         1.208          |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  1.193   |         1.1717         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.1751  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.1728  |         1.1719         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  1.1687  |         1.168          |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  1.1296  |         1.1266         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  1.1278  |         1.128          |
|           mobilenet_v2            |  96  | 0.9857 |  0.7649   |  1.1085  |         1.1018         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  1.1053  |         0.9973         |
|            timm_nfnet             | 128  | 0.9071 |  0.8748   |  1.0766  |         1.0727         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  1.0737  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  1.0736  |         1.0713         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  1.0687  |         0.9997         |
|                drq                |  1   | 0.9877 |  0.8852   |  1.0607  |         0.9573         |
|        Background_Matting         |  4   | 1.0125 |  0.6487   |  1.0421  |         1.0406         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  1.0344  |         1.0258         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  1.0198  |         0.9983         |
|              yolov3               |  16  | 0.9925 |  0.8288   |  1.0161  |         1.0117         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  1.0011  |         0.9945         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.9823  |         0.9808         |
|        shufflenet_v2_x1_0         | 128  | 0.9551 |  0.8384   |  0.9736  |         0.9649         |
|           timm_resnest            |  32  | 0.9887 |  0.8948   |  0.9689  |         0.9624         |
|              demucs               |  4   | 0.9661 |  0.9657   |  0.9674  |         0.9656         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.9645  |         0.9645         |
|            timm_regnet            |  32  | 0.9952 |  0.8494   |  0.9526  |         0.9536         |
|         timm_efficientnet         |  32  | 0.9847 |  0.8179   |  0.948   |         1.0068         |
|             resnet152             |  32  | 0.9957 |  0.8952   |  0.9449  |         0.942          |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.9434  |         0.939          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.9306  |         0.9308         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.9236  |         0.9173         |
|           squeezenet1_1           |  32  | 0.9695 |  0.9291   |  0.909   |         0.9087         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.8951  |         0.8931         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.8934  |         0.8893         |
|             resnet50              |  32  | 0.991  |  0.8586   |  0.889   |         0.8838         |
|            timm_vovnet            |  32  | 0.9892 |  0.8165   |  0.889   |         0.8869         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8873  |         0.8835         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8834  |         0.8659         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8694  |         0.869          |
|            densenet121            |  4   | 0.9944 |  0.9783   |  0.8228  |         0.8034         |
|        mobilenet_v3_large         |  32  | 0.9801 |  0.9451   |  0.8159  |         0.8726         |
|            mnasnet1_0             |  32  | 0.9801 |  0.8971   |  0.8149  |         0.8074         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.8064  |         0.8022         |
|          resnext50_32x4d          |  8   | 0.9942 |  0.8439   |  0.7797  |         0.772          |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.7552  |         0.7463         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7428  |         0.7362         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8594   |   0.62   |         0.6004         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.619   |         0.6097         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.451   |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3554  |         0.3395         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |   nan    |         1.1191         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |         1.0009         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.8565 |  0.8295   |   nan    |         0.9046         |
|               moco                |  32  | 0.9946 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+------------+------------------------+
|               name                |  bs  |  eager   | aot_eager |  inductor  | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+------------+------------------------+
|            hf_Reformer            |  4   | 82.1895  |  83.8006  | 10558.7174 |        76.0577         |
|        speech_transformer         |  32  | 74.0508  |  73.2549  | 3354.6196  |        38.3667         |
|            hf_T5_large            |  2   | 226.7132 |  277.799  | 3343.0507  |        120.6676        |
|           fastNLP_Bert            |  6   | 52.8565  |  60.8013  | 3168.0225  |        35.4509         |
|           hf_GPT2_large           |  4   | 212.5573 | 215.2151  | 2176.2074  |        120.2723        |
|             resnet152             |  32  | 63.7947  |  82.7448  | 1938.6607  |        63.7865         |
|            densenet121            |  4   | 52.8203  |  74.4326  | 1710.7589  |        50.7307         |
|              hf_Bart              |  4   |  58.894  |  73.6387  | 1548.2983  |        45.1874         |
|              demucs               |  4   | 53.7459  |  53.3884  | 1319.3129  |        51.7129         |
|                drq                |  1   |  3.5025  |  4.4064   | 1285.1983  |         3.9372         |
|            timm_regnet            |  32  | 60.8321  |  71.2107  |  1215.95   |        57.3923         |
|              yolov3               |  16  | 68.6795  |  84.9169  | 1211.3023  |         57.03          |
|            timm_nfnet             | 128  | 120.107  | 119.6429  | 1166.7849  |         80.236         |
|         timm_efficientnet         |  32  | 33.9856  |  50.6075  | 1136.7948  |        29.3523         |
| attention_is_all_you_need_pytorch | 256  | 55.3424  |  59.9176  | 1053.1644  |        38.2475         |
|        Background_Matting         |  4   | 125.8744 | 918.4645  | 1021.7988  |        104.0611        |
|               hf_T5               |  8   | 181.7161 | 210.8335  | 1005.1947  |        91.3938         |
|         soft_actor_critic         | 256  |  1.8439  |  2.4173   |  986.9886  |         1.9625         |
|              hf_GPT2              |  4   | 48.8048  |  50.1999  |  914.7899  |        26.9485         |
|        mobilenet_v3_large         |  32  | 26.5604  |  33.659   |  911.1527  |        24.9726         |
|           BERT_pytorch            |  16  | 54.7667  |  67.0691  |  906.9695  |        28.9393         |
|        shufflenet_v2_x1_0         | 128  | 30.0863  |  41.1511  |  896.1937  |         26.979         |
|           hf_Bert_large           |  4   | 82.5795  |  93.2867  |  887.1089  |        52.9057         |
|           mobilenet_v2            |  96  | 47.0417  |  60.2268  |  876.1692  |        30.7994         |
|            mnasnet1_0             |  32  | 22.3597  |  29.6333  |  842.2421  |         22.637         |
|          resnext50_32x4d          |  8   | 20.1774  |  27.5427  |  839.9795  |        20.2488         |
|             resnet50              |  32  | 26.3631  |  33.5946  |  838.5196  |        27.0652         |
|            timm_vovnet            |  32  |  28.957  |  35.5515  |  836.6784  |        26.6688         |
|         phlippe_densenet          | 128  | 23.4096  |  29.7828  |  830.3198  |        23.9075         |
|      timm_vision_transformer      |  32  | 29.7897  |  33.6312  |  829.2838  |        20.2459         |
|           pytorch_unet            |  1   | 39.9251  | 194.1525  |  764.0663  |         29.459         |
|          LearningToPaint          |  96  | 11.4032  |  14.4324  |  711.4037  |         10.688         |
|           timm_resnest            |  32  | 24.2114  |  28.1106  |  695.9135  |        15.8906         |
|           hf_DistilBert           |  8   | 32.5863  |  32.7899  |  694.836   |         21.366         |
|       functorch_dp_cifar10        |  64  | 12.1772  |  10.8527  |  555.442   |         7.5024         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.2581  |  15.237   |  551.6198  |         7.9076         |
|             resnet18              |  16  |  9.2294  |  11.8467  |  535.2395  |        10.2454         |
|              hf_Bert              |  4   | 40.4881  |  48.7025  |  518.053   |         29.503         |
|          phlippe_resnet           | 128  |  8.9766  |  11.4711  |  487.8161  |         8.9282         |
|          pytorch_stargan          |  16  | 14.8893  |  18.1737  |  487.7831  |        12.2029         |
|           squeezenet1_1           |  32  |  10.748  |  10.7985  |  482.6729  |         7.6176         |
|               vgg16               |  64  | 66.2593  |  66.2378  |  443.8457  |        52.8028         |
|             hf_Albert             |  8   | 69.7782  |  71.3705  |  416.4762  |        29.7832         |
|              alexnet              | 128  |  9.8181  |  9.8564   |  407.6865  |         8.667          |
|      nvidia_deeprecommender       | 256  | 10.2183  |  10.2337  |  382.749   |        10.0384         |
|               dcgan               |  32  |  2.3616  |  2.9311   |  371.8205  |         2.4983         |
|           lennard_jones           | 1000 |  1.8128  |  2.1082   |  327.6554  |         1.8448         |
|            tts_angular            |  64  |  6.7245  |  7.0425   |  323.1815  |         6.4667         |
|          pytorch_struct           | 200  |  5.0065  |  6.0003   |  228.8885  |         4.7408         |
|            Super_SloMo            |  6   | 79.5728  | 443.6295  |  64.4356   |        64.2594         |
|   timm_vision_transformer_large   |  32  | 465.4299 |    nan    |    nan     |        429.5734        |
|            hf_BigBird             |  2   | 205.9733 |  252.644  |    nan     |        117.9592        |
|           hf_Longformer           |  2   | 137.9415 | 198.9145  |    nan     |        86.0392         |
|               dlrm                | 1024 |  4.3565  |  4.8361   |    nan     |         3.6419         |
|               moco                |  32  | 50.7087  |    nan    |    nan     |          nan           |
|                gat                |  0   |   nan    |    nan    |    nan     |          nan           |
|                gcn                |  0   |   nan    |    nan    |    nan     |          nan           |
|               sage                |  0   |   nan    |    nan    |    nan     |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |    nan     |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |    nan     |          nan           |
+-----------------------------------+------+----------+-----------+------------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9913 |  0.9261   |  2.4786  |         2.4208         |
|      GPT2ForSequenceClassification      |  4  | 0.9776 |  0.9515   |  2.2963  |         2.2835         |
|          MobileBertForMaskedLM          | 64  | 0.9525 |  0.8203   |  2.2871  |         1.075          |
|       MT5ForConditionalGeneration       | 16  | 0.9897 |   0.837   |  2.2156  |         1.836          |
|       ElectraForQuestionAnswering       | 64  | 0.9881 |  0.9797   |  2.0993  |         2.1127         |
|     MobileBertForQuestionAnswering      | 128 | 0.9557 |  0.8021   |  2.0426  |         1.1308         |
|    LayoutLMForSequenceClassification    | 16  | 0.9841 |  0.9697   |  1.8158  |         1.7891         |
|            XLNetLMHeadModel             |  8  | 0.9964 |  0.9655   |  1.8113  |         1.8102         |
|       RobertaForQuestionAnswering       | 16  | 0.9843 |  0.9693   |  1.7694  |         1.764          |
|        BertForQuestionAnswering         | 16  | 0.9854 |  0.9696   |  1.761   |         1.7611         |
|                 T5Small                 |  4  | 0.9804 |  0.8503   |  1.7415  |         1.7299         |
|       T5ForConditionalGeneration        |  4  | 0.9819 |  0.8559   |  1.7406  |         1.7347         |
|               DistillGPT2               | 16  | 0.9875 |  0.9548   |  1.6648  |         1.7001         |
|            PLBartForCausalLM            |  8  | 0.9843 |  0.9527   |  1.6601  |         1.6384         |
|           RobertaForCausalLM            | 16  | 0.9866 |  0.9624   |  1.6596  |         1.6671         |
|    MegatronBertForQuestionAnswering     |  8  |  0.98  |  0.9608   |  1.6523  |         1.6259         |
|       AlbertForQuestionAnswering        |  4  | 0.9999 |  0.8854   |  1.648   |         1.6435         |
|            AlbertForMaskedLM            |  4  | 0.9997 |  0.8848   |  1.6352  |         1.6326         |
|     PLBartForConditionalGeneration      |  4  | 0.9817 |  0.9474   |  1.6192  |         1.6761         |
|           LayoutLMForMaskedLM           | 16  | 0.9852 |  0.9616   |  1.5885  |         1.6057         |
|             BertForMaskedLM             | 16  | 0.9854 |  0.9602   |  1.5838  |         1.5875         |
|                CamemBert                | 16  | 0.9876 |  0.9623   |  1.5355  |         1.5318         |
|             BartForCausalLM             |  4  |  0.98  |  0.9571   |  1.5297  |         1.5421         |
|         MegatronBertForCausalLM         |  4  | 0.9957 |  0.9118   |  1.5276  |         1.5127         |
|      MBartForConditionalGeneration      |  2  | 1.0004 |  0.9618   |  1.5256  |         1.4783         |
|            MBartForCausalLM             |  4  | 0.9845 |  0.9499   |  1.5223  |         1.542          |
|      BartForConditionalGeneration       |  2  | 0.998  |  0.9608   |  1.4945  |         1.4565         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0014 |  0.9032   |  1.4932  |         1.4197         |
|     DistilBertForQuestionAnswering      | 256 | 0.9932 |   0.985   |  1.4562  |         1.4451         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9814 |  0.9112   |  1.2964  |         1.2596         |
|            TrOCRForCausalLM             | 32  | 0.9888 |  0.9514   |   1.27   |         1.2882         |
|          DistilBertForMaskedLM          | 128 | 0.992  |  0.9476   |  1.215   |         1.235          |
|            YituTechConvBert             | 16  | 0.9854 |  0.9551   |  0.0282  |         1.4893         |
|           ElectraForCausalLM            | 32  | 0.9813 |  0.9346   |  0.0261  |         1.816          |
|         Speech2Text2ForCausalLM         | 256 | 0.9806 |  0.9262   |  0.0231  |         1.5431         |
|     PegasusForConditionalGeneration     | 32  | 0.9953 |  0.9213   |  0.0219  |         1.2914         |
|           PegasusForCausalLM            | 32  |  0.98  |  0.9093   |  0.0207  |         1.2512         |
|             XGLMForCausalLM             |  8  | 0.9787 |  0.8116   |  0.0203  |         1.4501         |
|     M2M100ForConditionalGeneration      | 16  | 1.0356 |  0.8425   |  0.0167  |         1.5167         |
|          BlenderbotForCausalLM          |  4  | 0.9854 |  0.8478   |   0.0    |         1.2843         |
|          AllenaiLongformerBase          |  4  | 0.8825 |  0.6297   |   0.0    |         1.558          |
|       DebertaForQuestionAnswering       |  8  | 0.7924 |   0.688   |   0.0    |         0.9464         |
|           DebertaForMaskedLM            |  4  | 0.7413 |  0.5653   |   0.0    |         0.8029         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7122 |  0.5242   |   0.0    |         0.6628         |
|          DebertaV2ForMaskedLM           |  1  | 0.6867 |  0.5271   |   0.0    |         0.6568         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|     M2M100ForConditionalGeneration      | 16  | 12.1657 |  25.7682  | 171.4798 |        137.2563        |
|             XGLMForCausalLM             |  8  | 9.6661  |  21.4483  | 155.6345 |        135.8424        |
|          MobileBertForMaskedLM          | 64  | 18.0247 |  40.7111  | 151.2448 |        144.5766        |
|     MobileBertForQuestionAnswering      | 128 | 18.2557 |  40.617   |  143.19  |        140.7126        |
|       MT5ForConditionalGeneration       | 16  | 8.5796  |  19.0561  | 127.3442 |        132.4586        |
|     PegasusForConditionalGeneration     | 32  | 5.2217  |  19.5507  | 104.4757 |         75.781         |
|            YituTechConvBert             | 16  | 11.2108 |  19.9191  | 95.1683  |        75.5657         |
|            XLNetLMHeadModel             |  8  | 10.8162 |  27.6191  | 93.6017  |        93.7147         |
|      MBartForConditionalGeneration      |  2  | 12.0453 |  26.234   | 83.0588  |        78.7063         |
|      BartForConditionalGeneration       |  2  | 11.8594 |  26.4483  |  79.323  |        75.2776         |
|           ElectraForCausalLM            | 32  | 8.1323  |  13.5966  | 78.0831  |         66.326         |
|         MegatronBertForCausalLM         |  4  | 10.8022 |  21.8065  | 70.0875  |        67.3458         |
|    MegatronBertForQuestionAnswering     |  8  | 10.7616 |  21.5693  | 69.9476  |        68.2855         |
|           PegasusForCausalLM            | 32  | 5.9827  |  11.4104  | 57.5755  |        43.1385         |
| BlenderbotSmallForConditionalGeneration | 64  |  7.782  |  17.2737  | 57.2802  |        54.4152         |
|       T5ForConditionalGeneration        |  4  | 5.9586  |  13.0142  | 50.8178  |        49.3458         |
|     PLBartForConditionalGeneration      |  4  | 6.3097  |  13.7558  | 50.8093  |        47.7798         |
|                 T5Small                 |  4  | 5.9045  |  13.0649  | 50.7751  |         49.002         |
|    LayoutLMForSequenceClassification    | 16  | 5.9307  |  11.4089  | 46.7158  |        46.7447         |
|       ElectraForQuestionAnswering       | 64  | 5.6336  |  10.9672  | 44.4278  |         45.581         |
|            MBartForCausalLM             |  4  | 6.0268  |  11.4118  | 42.4296  |        40.9042         |
|           LayoutLMForMaskedLM           | 16  | 6.0026  |  11.4282  | 42.0546  |        41.4099         |
|         Speech2Text2ForCausalLM         | 256 | 3.0793  |  5.9187   | 41.9525  |        31.7857         |
|             BartForCausalLM             |  4  | 5.8653  |  11.2462  | 41.6215  |        38.3442         |
|             BertForMaskedLM             | 16  | 5.5313  |  10.9038  |  39.996  |         41.173         |
|        BertForQuestionAnswering         | 16  | 5.2548  |  10.7503  |  39.99   |        39.6766         |
|             OPTForCausalLM              |  2  | 4.8954  |  10.1988  | 38.8911  |        37.3954         |
|            TrOCRForCausalLM             | 32  |  5.825  |  11.3055  |  38.478  |        37.1094         |
|            AlbertForMaskedLM            |  4  | 2.2915  |  8.2635   | 37.6364  |        37.7443         |
|           RobertaForCausalLM            | 16  |  5.415  |  10.949   | 37.5933  |        37.4292         |
|                CamemBert                | 16  | 5.5665  |  10.8595  | 37.5479  |        37.5739         |
|      GPT2ForSequenceClassification      |  4  | 4.9022  |  9.9915   | 37.0108  |        36.1013         |
|       RobertaForQuestionAnswering       | 16  | 5.5103  |  10.8853  | 36.4308  |        36.4559         |
|     DistilBertForQuestionAnswering      | 256 | 2.6675  |  5.3371   | 35.8673  |        35.7513         |
|       AlbertForQuestionAnswering        |  4  | 2.2767  |  8.1703   | 35.2286  |        33.8603         |
|          DistilBertForMaskedLM          | 128 | 2.6853  |  5.5737   | 34.6743  |        34.7361         |
|       BlenderbotSmallForCausalLM        | 64  | 3.9608  |  7.5215   | 29.6884  |        29.4874         |
|               DistillGPT2               | 16  | 2.7029  |  5.1243   | 29.5604  |        27.8063         |
|            PLBartForCausalLM            |  8  | 3.1405  |  6.2184   | 26.9711  |        26.0019         |
|          AllenaiLongformerBase          |  4  | 11.6698 |  32.3584  |   nan    |        116.8636        |
|          DebertaV2ForMaskedLM           |  1  | 15.6456 |  27.4922  |   nan    |        70.6975         |
|      DebertaV2ForQuestionAnswering      |  2  | 16.1094 |  27.5581  |   nan    |        68.0565         |
|          BlenderbotForCausalLM          |  4  | 11.7936 |  21.8621  |   nan    |        67.6508         |
|       DebertaForQuestionAnswering       |  8  | 7.4993  |  13.8434  |   nan    |         57.492         |
|           DebertaForMaskedLM            |  4  | 7.4065  |  13.8945  |   nan    |        52.9575         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  1.3156  |         1.3147         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  1.2697  |         1.268          |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1962  |         1.195          |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.1782  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.1778  |         1.1724         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1562  |         1.2307         |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.0965  |         1.1346         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0902  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0902  |         1.1813         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0897  |         1.1368         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0605  |         1.1479         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0562  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.056   |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0532  |         1.0491         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  1.044   |         1.1152         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0104  |         1.0518         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8957   |  1.0086  |         1.0074         |
|            YituTechConvBert             | 16  | 0.953  |  0.8732   |  0.9922  |         0.9905         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9772  |         1.052          |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.971   |         1.0642         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.9653  |         1.0962         |
|     M2M100ForConditionalGeneration      | 16  | 0.9551 |  0.8773   |  0.9621  |         0.9607         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9444  |         0.9912         |
|           PegasusForCausalLM            | 32  | 0.9259 |  0.8407   |  0.9387  |         0.9368         |
|             XGLMForCausalLM             |  8  | 0.9432 |  0.8613   |  0.9344  |         0.933          |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9294  |         0.9749         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.9273  |         1.0307         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9162  |         0.9886         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.9136  |         1.0139         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9127  |         1.0018         |
|           ElectraForCausalLM            | 32  | 0.9161 |  0.7864   |  0.8953  |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8872  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8855  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8749  |         0.9803         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8215  |         0.9119         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.8112  |         1.016          |
|         Speech2Text2ForCausalLM         | 256 | 0.8885 |  0.7587   |  0.8111  |         0.8095         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6659  |         0.8392         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |   nan    |         1.1526         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |   nan    |         0.9978         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9763   |   nan    |         0.9802         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |   nan    |         0.9665         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |   nan    |         0.8742         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+-----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor  | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+-----------+------------------------+
|     M2M100ForConditionalGeneration      | 16  | 135.0259 | 128.2484  | 8991.5991 |        78.3641         |
|     PegasusForConditionalGeneration     | 32  | 159.3814 | 155.2831  | 6822.6724 |        128.0385        |
|             XGLMForCausalLM             |  8  | 109.506  | 117.8348  | 6006.2759 |        81.2455         |
|            YituTechConvBert             | 16  | 127.7126 | 130.9154  | 4479.7971 |        84.0331         |
|           PegasusForCausalLM            | 32  | 77.3728  |  81.7889  | 3498.947  |        59.3645         |
|           ElectraForCausalLM            | 32  | 89.9646  |  94.0773  | 3401.1595 |        48.4517         |
|         Speech2Text2ForCausalLM         | 256 | 53.9897  |  56.1961  | 2384.4252 |        34.7897         |
|            AlbertForMaskedLM            |  4  | 266.004  | 300.4326  | 162.9857  |        163.2858        |
|       AlbertForQuestionAnswering        |  4  | 263.9248 | 297.7757  | 160.4977  |        160.7337        |
|            XLNetLMHeadModel             |  8  | 280.496  | 289.7692  | 154.8032  |        153.941         |
|            TrOCRForCausalLM             | 32  | 138.6371 | 144.5412  | 108.1401  |        106.7628        |
|      MBartForConditionalGeneration      |  2  | 143.9534 | 143.6359  |  94.2523  |        93.5351         |
|      BartForConditionalGeneration       |  2  | 151.5699 | 144.0868  |  91.5822  |        109.2736        |
|    MegatronBertForQuestionAnswering     |  8  | 144.8897 |  147.481  |  85.9416  |        87.2989         |
|     MobileBertForQuestionAnswering      | 128 | 203.2267 | 219.5483  |  81.8692  |        174.3894        |
| BlenderbotSmallForConditionalGeneration | 64  | 118.1416 | 126.8121  |  81.5957  |        79.3995         |
|                CamemBert                | 16  | 120.2119 | 122.7899  |  77.1818  |        77.2559         |
|          MobileBertForMaskedLM          | 64  | 207.9298 | 214.9496  |  76.7288  |        160.7736        |
|             BartForCausalLM             |  4  | 117.5858 | 118.2085  |  74.9239  |        73.9824         |
|            MBartForCausalLM             |  4  | 115.1141 | 120.2542  |  74.8724  |        74.2253         |
|     PLBartForConditionalGeneration      |  4  | 121.2183 | 122.8827  |  72.4188  |        73.3674         |
|     DistilBertForQuestionAnswering      | 256 | 103.8673 | 105.2547  |  71.1225  |        71.7721         |
|           LayoutLMForMaskedLM           | 16  | 114.3232 | 116.7828  |  70.8088  |        70.1096         |
|            PLBartForCausalLM            |  8  | 115.5109 | 118.4212  |  70.3089  |        69.5191         |
|          DistilBertForMaskedLM          | 128 | 85.2341  |  89.5304  |  69.5457  |        68.6431         |
|             BertForMaskedLM             | 16  | 111.7533 |  114.416  |  69.5436  |         69.81          |
|           RobertaForCausalLM            | 16  | 116.8671 | 119.2693  |  69.3105  |        68.9352         |
|             OPTForCausalLM              |  2  | 171.4146 |  182.647  |  68.1839  |         69.314         |
|               DistillGPT2               | 16  | 107.208  | 110.5305  |  63.4119  |        62.1364         |
|                 T5Small                 |  4  | 106.9096 | 122.9761  |  60.0708  |        60.3378         |
|       T5ForConditionalGeneration        |  4  | 108.475  | 123.8629  |  59.8958  |        60.5246         |
|         MegatronBertForCausalLM         |  4  | 88.4815  |  95.4907  |  56.7753  |        63.5257         |
|       ElectraForQuestionAnswering       | 64  | 118.1773 | 118.0077  |  54.578   |        55.1022         |
|        BertForQuestionAnswering         | 16  | 96.6508  |  98.061   |  54.1016  |        54.0452         |
|       RobertaForQuestionAnswering       | 16  | 97.3434  |   98.34   |   54.02   |         54.157         |
|    LayoutLMForSequenceClassification    | 16  | 99.3594  | 101.4141  |  53.6653  |        54.6015         |
|       BlenderbotSmallForCausalLM        | 64  | 62.6369  |  63.6153  |  47.2946  |        45.9553         |
|       MT5ForConditionalGeneration       | 16  | 104.8074 | 111.7312  |  42.3767  |        50.4429         |
|      GPT2ForSequenceClassification      |  4  | 93.7968  |  95.8879  |  39.8562  |        40.0725         |
|      DebertaV2ForQuestionAnswering      |  2  | 166.1882 | 201.9773  |    nan    |        159.453         |
|          DebertaV2ForMaskedLM           |  1  | 153.2839 | 197.7794  |    nan    |        155.2911        |
|          AllenaiLongformerBase          |  4  | 206.2598 | 287.5805  |    nan    |        116.2388        |
|          BlenderbotForCausalLM          |  4  | 114.2992 | 117.3098  |    nan    |        91.7395         |
|       DebertaForQuestionAnswering       |  8  | 95.3445  | 110.1718  |    nan    |        80.1181         |
|           DebertaForMaskedLM            |  4  | 95.0486  | 112.6381  |    nan    |        77.0848         |
+-----------------------------------------+-----+----------+-----------+-----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 0.9985 |  0.9648   |  0.3307  |         1.0028         |
|          mixer_b16_224          | 128 | 0.9974 |  1.0152   |  0.2751  |         1.3593         |
|            pit_b_224            | 64  | 0.9941 |  0.9926   |  0.2327  |         1.4279         |
|        tnt_s_patch16_224        | 128 | 0.998  |  0.9963   |  0.2104  |         2.9733         |
|          gmlp_s16_224           | 128 | 0.9943 |  1.0823   |  0.2068  |         1.8317         |
|          gmixer_24_224          | 128 | 0.9952 |  0.8888   |  0.1891  |         1.7476         |
|           convit_base           | 64  | 0.9982 |  0.9977   |  0.1675  |         1.6106         |
|          resmlp_12_224          | 128 | 0.9931 |  0.8894   |  0.1376  |         1.2564         |
|           tf_mixnet_l           | 128 | 0.9769 |  0.8267   |  0.1211  |         1.1891         |
|       eca_botnext26ts_256       | 128 | 0.9729 |  0.7192   |  0.1209  |         1.4228         |
|            mixnet_l             | 128 | 0.9764 |  0.8208   |  0.1207  |         1.1813         |
|          botnet26t_256          | 128 | 0.9733 |  0.8513   |  0.1175  |         1.4226         |
|         coat_lite_mini          | 128 | 0.997  |  0.9958   |  0.1165  |         1.917          |
|             dla102              | 128 | 0.9962 |   0.815   |  0.1131  |         1.5201         |
|      beit_base_patch16_224      | 64  | 0.9965 |  0.9497   |  0.1101  |         1.351          |
|         visformer_small         | 128 | 0.996  |  0.9449   |  0.1066  |         1.1657         |
|          inception_v3           | 128 | 0.9964 |   0.865   |  0.1064  |         1.5181         |
|       gluon_inception_v3        | 128 | 0.9962 |  0.8654   |  0.1064  |         1.5196         |
|        adv_inception_v3         | 128 | 0.9963 |  0.8603   |  0.1059  |         1.5186         |
|           dm_nfnet_f0           | 128 | 0.9867 |   0.985   |  0.1053  |         1.4291         |
|      vit_base_patch16_224       | 64  | 0.9961 |   0.993   |  0.1032  |         1.2344         |
|            nfnet_l0             | 128 | 0.9901 |  0.8122   |  0.1011  |         1.436          |
| deit_base_distilled_patch16_224 | 64  | 0.9962 |  0.9938   |  0.0991  |         1.2547         |
|           res2next50            | 128 | 0.9992 |  0.8257   |  0.0944  |         1.3622         |
|           volo_d1_224           | 64  | 0.9943 |  0.9729   |  0.0943  |         1.6654         |
|      xcit_large_24_p8_224       |  5  | 0.9884 |  0.8688   |  0.0911  |         1.5374         |
|          convnext_base          | 64  | 0.9836 |  0.9844   |  0.0876  |         1.4707         |
|  swin_base_patch4_window7_224   | 64  | 0.9908 |  0.9545   |  0.0852  |         1.605          |
|     swsl_resnext101_32x16d      | 32  | 0.9977 |  0.8403   |  0.0839  |         1.0239         |
|          cspdarknet53           | 64  | 0.932  |  0.7852   |  0.0774  |         1.2589         |
|         poolformer_m36          | 64  | 0.9865 |  0.9832   |  0.0773  |         1.3186         |
|            repvgg_a2            | 128 | 0.9377 |  0.7548   |  0.0768  |         1.1171         |
|            gernet_l             | 128 | 0.9338 |  0.7935   |  0.0755  |         1.065          |
|           selecsls42b           | 128 | 0.9985 |  0.8115   |  0.0727  |         1.4119         |
|           resnest101e           | 64  | 0.9939 |   0.866   |  0.0726  |         1.3533         |
|            hrnet_w18            | 128 | 0.9921 |  0.6439   |  0.0718  |          1.35          |
|       tf_efficientnet_b0        | 128 | 0.9608 |  0.6806   |  0.0714  |         1.3847         |
|          pnasnet5large          | 16  | 0.9853 |  0.9175   |  0.0698  |         1.1265         |
|          jx_nest_base           | 32  | 0.9865 |  0.9852   |  0.0685  |         1.3587         |
|         crossvit_9_240          | 128 | 0.9901 |  0.7825   |  0.0676  |         1.6117         |
|         mobilenetv2_100         | 128 | 0.9488 |  0.7371   |  0.0653  |         1.4445         |
|            fbnetv3_b            | 128 | 0.9482 |  0.7689   |  0.0649  |         1.3239         |
|        res2net50_14w_8s         | 128 | 0.999  |  0.7903   |  0.0647  |         1.3572         |
|          cait_m36_384           |  4  | 0.9949 |  0.9416   |  0.0647  |         1.3457         |
|          ghostnet_100           | 128 | 0.9923 |  0.7641   |  0.0624  |         1.6198         |
|             dpn107              | 32  | 0.9318 |  0.8069   |  0.0617  |         1.1347         |
|           rexnet_100            | 128 | 0.9521 |   0.703   |  0.0612  |         1.3297         |
|          spnasnet_100           | 128 | 0.9413 |  0.7381   |  0.061   |         1.4164         |
|        gluon_xception65         | 32  | 0.9925 |  0.8415   |  0.0593  |         1.0788         |
|        twins_pcpvt_base         | 64  | 0.9958 |  0.9148   |  0.0586  |         1.6714         |
|      mobilenetv3_large_100      | 128 | 0.9489 |  0.7602   |  0.0564  |         1.432          |
|            tinynet_a            | 128 | 0.946  |  0.6782   |  0.054   |         1.2456         |
|        res2net101_26w_4s        | 64  | 0.9997 |  0.8003   |  0.0421  |         1.0693         |
|           regnety_002           | 128 | 0.9437 |  0.7085   |  0.0397  |         1.2167         |
|        sebotnet33ts_256         | 64  | 0.9576 |  0.7643   |  0.0395  |         1.5321         |
|            lcnet_050            | 128 | 0.9403 |  0.7357   |  0.0391  |         1.4066         |
|           fbnetc_100            | 128 | 0.9493 |  0.7389   |  0.0368  |         1.402          |
|           mobilevit_s           | 64  | 0.9615 |  0.7313   |  0.0349  |         1.4408         |
|        ese_vovnet19b_dw         | 128 | 0.9575 |  0.8335   |  0.0327  |         1.3707         |
|           mnasnet_100           | 128 | 0.9484 |  0.7407   |  0.0326  |         1.4948         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|            hrnet_w18            | 128 | 9.6645  |  36.306   | 239.6495 |        243.3731        |
|           rexnet_100            | 128 | 5.7305  |  11.2835  | 230.5481 |        292.0584        |
|          ghostnet_100           | 128 | 7.5044  |  15.0457  | 199.8083 |        235.9697        |
|          pnasnet5large          | 16  | 8.2399  |  26.4357  | 161.2678 |        162.9962        |
|           resnest101e           | 64  | 11.0532 |  24.5079  | 157.5606 |        163.1424        |
|            fbnetv3_b            | 128 | 8.9949  |  17.0325  | 152.766  |        174.4576        |
|           mobilevit_s           | 64  | 5.3895  |  11.4044  | 149.3897 |        161.0668        |
|        res2net101_26w_4s        | 64  | 10.6701 |  25.0178  | 144.0901 |        151.1697        |
|        twins_pcpvt_base         | 64  | 10.7952 |  23.4659  | 143.9423 |        147.9188        |
|            mixnet_l             | 128 | 8.3354  |  16.369   | 139.288  |        158.0146        |
|           tf_mixnet_l           | 128 | 9.2109  |  17.8595  | 138.3913 |        161.359         |
|        adv_inception_v3         | 128 | 5.6758  |  12.4653  | 137.0842 |        159.314         |
|            tinynet_a            | 128 | 6.0199  |  12.2794  | 137.0509 |        161.1748        |
|          inception_v3           | 128 | 5.7272  |  12.4704  | 136.8852 |        156.5059        |
|      xcit_large_24_p8_224       |  5  | 12.5002 |  28.5873  | 136.7425 |        132.8375        |
|       gluon_inception_v3        | 128 |  5.812  |  12.6364  | 136.6642 |        158.8665        |
|       tf_efficientnet_b0        | 128 | 5.0365  |  11.1486  | 130.7149 |        155.4947        |
|      mobilenetv3_large_100      | 128 | 4.2404  |  8.5099   | 130.4923 |        162.5033        |
|          cait_m36_384           |  4  | 13.6609 |  32.7876  | 121.6808 |        116.2197        |
|        res2net50_14w_8s         | 128 | 8.9621  |  22.7101  | 120.6844 |        124.6533        |
|           fbnetc_100            | 128 | 4.9846  |  9.4102   | 120.222  |        137.5629        |
|          spnasnet_100           | 128 | 4.9598  |  9.9593   | 119.2263 |        137.9566        |
|  swin_base_patch4_window7_224   | 64  | 8.4778  |  19.4203  | 113.9644 |        108.2668        |
|         mobilenetv2_100         | 128 | 4.0503  |  7.8268   | 108.503  |        128.9254        |
|           mnasnet_100           | 128 | 4.0055  |  7.6685   | 108.1111 |        119.9525        |
|        sebotnet33ts_256         | 64  |  4.126  |  9.4382   | 102.6961 |        106.6853        |
|         poolformer_m36          | 64  | 7.6467  |  14.4773  | 101.1615 |        102.147         |
|             dpn107              | 32  | 9.7263  |  19.6335  | 100.0443 |        102.3495        |
|        gluon_xception65         | 32  | 7.9432  |  17.0213  | 92.8991  |        96.0958         |
|           regnety_002           | 128 | 4.8938  |  9.2909   | 92.4788  |        106.8307        |
|             dla102              | 128 | 6.2954  |  13.9701  |  92.453  |        97.1047         |
|         coat_lite_mini          | 128 | 3.3488  |  7.8922   | 88.8953  |        89.2585         |
|          cspdarknet53           | 64  | 5.8464  |  10.9726  | 88.5125  |        99.3637         |
|       eca_botnext26ts_256       | 128 | 3.1282  |  6.8001   | 88.1906  |        97.0254         |
|          jx_nest_base           | 32  | 6.8659  |  15.0856  |  87.021  |         83.776         |
|         crossvit_9_240          | 128 | 5.8789  |  13.3152  |  85.746  |         88.278         |
|           res2next50            | 128 | 5.0978  |  12.1719  | 82.3692  |        89.0378         |
|            lcnet_050            | 128 | 2.5604  |  4.9898   | 81.5926  |         96.766         |
|          botnet26t_256          | 128 | 2.9381  |  6.0391   | 79.5938  |         89.416         |
|           selecsls42b           | 128 | 2.4851  |  5.4678   | 77.4892  |        90.2344         |
|           volo_d1_224           | 64  | 5.0428  |  11.8991  | 76.3472  |        75.1734         |
|        tnt_s_patch16_224        | 128 | 6.4226  |  16.2851  | 74.5018  |        67.9377         |
|            gernet_l             | 128 | 5.0292  |  8.9684   | 72.8873  |         81.277         |
|        ese_vovnet19b_dw         | 128 | 2.5596  |  4.5474   | 72.8558  |        73.8548         |
|            nfnet_l0             | 128 | 5.2805  |  11.581   | 72.7427  |         77.452         |
|           dm_nfnet_f0           | 128 | 6.0185  |  11.5104  | 70.8053  |        75.3662         |
|     swsl_resnext101_32x16d      | 32  | 6.2312  |  13.685   | 65.6405  |        62.1553         |
|         visformer_small         | 128 | 2.6504  |  6.1492   | 65.0912  |        67.0074         |
|          convnext_base          | 64  | 6.6652  |  12.4782  | 62.3224  |        57.8615         |
|          gmlp_s16_224           | 128 | 5.6541  |  12.1079  | 62.1603  |        60.2958         |
|            repvgg_a2            | 128 | 4.8856  |  8.8539   | 58.5421  |        62.3373         |
|          gmixer_24_224          | 128 |  5.782  |  12.9786  | 53.9204  |        49.9325         |
|           convit_base           | 64  | 3.7269  |  8.7292   | 50.9089  |        48.6495         |
|            pit_b_224            | 64  |  3.448  |   8.039   | 47.3876  |        46.2401         |
| deit_base_distilled_patch16_224 | 64  |  3.177  |  7.1274   | 44.3575  |        43.6614         |
|      vit_base_patch16_224       | 64  | 3.0481  |  7.0326   |  42.964  |        38.7274         |
|          resmlp_12_224          | 128 | 2.7919  |  5.2843   | 41.8837  |        42.3516         |
|      beit_base_patch16_224      | 64  | 3.9203  |  8.7052   | 40.0214  |        36.2329         |
|        convmixer_768_32         | 32  |  1.671  |  6.7754   | 38.3178  |        35.8351         |
|          mixer_b16_224          | 128 | 2.7398  |  5.8778   | 34.7342  |        32.1267         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.2872  |         1.2836         |
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.2057  |         1.2049         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  1.1899  |         1.1871         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1607  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.1583  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.1215  |         1.1179         |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  1.1129  |         1.1115         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  1.089   |         1.0876         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.0875  |         1.0845         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  1.0758  |         1.0721         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  1.0757  |         1.0728         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  1.0696  |         1.0675         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  1.0556  |         1.0539         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  1.0512  |         1.0506         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  1.0494  |         1.0457         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0377  |         1.0351         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  1.0361  |         1.0328         |
|          convnext_base          | 64  | 1.001  |   0.924   |  1.0345  |         1.0338         |
|             dla102              | 128 | 0.9634 |  0.9151   |  1.0323  |         1.0326         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  1.0251  |         1.0242         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  1.021   |         1.0202         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  1.0203  |         1.0194         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  1.0082  |         1.0072         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  1.0071  |         1.0057         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9976  |         0.9952         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9957  |         0.9948         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.9925  |          0.99          |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.9923  |         0.9902         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9917  |         0.9903         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.9912  |         0.9898         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9905  |         0.989          |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.9885  |         0.989          |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9864  |         0.9854         |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9821  |         0.9793         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.9793  |         0.9786         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.9793  |         0.977          |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.979   |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.9776  |         0.9732         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.9738  |         0.9706         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9732  |         0.9727         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.9714  |         0.9705         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.9702  |         0.9664         |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.966   |         0.9611         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.9646  |         0.9642         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.9637  |         0.9607         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.9611  |         0.9604         |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.9582  |         0.9535         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.9568  |         0.9547         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9562  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9537  |         0.9528         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.9509  |         0.9483         |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.9497  |         0.9451         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.9448  |         0.9403         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.9376  |         0.9361         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.9046  |         0.9045         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.901   |         0.8966         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.8898  |         0.884          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+-----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor  | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+-----------+------------------------+
|            hrnet_w18            | 128 | 282.2172 | 435.3193  | 3929.9474 |        207.0935        |
|          pnasnet5large          | 16  | 198.6062 | 213.3644  | 2836.7453 |        174.3265        |
|          cait_m36_384           |  4  | 167.6719 | 179.8621  | 2592.7929 |        124.3466        |
|        res2net101_26w_4s        | 64  |  99.45   |  123.333  | 2421.967  |        93.1549         |
|           mobilevit_s           | 64  |  84.674  | 111.2679  | 2344.6101 |        56.4226         |
|           resnest101e           | 64  | 165.2835 | 189.5667  | 2259.2373 |        121.4335        |
|        res2net50_14w_8s         | 128 | 140.5392 | 177.9981  | 2182.6304 |        103.8276        |
|           fbnetc_100            | 128 | 82.8697  | 106.3678  | 2144.9167 |        56.0773         |
|        twins_pcpvt_base         | 64  | 119.5086 | 129.0534  | 2125.401  |        70.6628         |
|        sebotnet33ts_256         | 64  | 80.4769  | 100.7461  | 1954.3205 |        50.2301         |
|        ese_vovnet19b_dw         | 128 | 64.5537  |  74.2773  | 1899.315  |        45.1583         |
|         poolformer_m36          | 64  | 146.6183 | 147.0582  | 1881.144  |        109.7108        |
|           mnasnet_100           | 128 | 64.1631  |  82.4453  | 1878.2017 |        40.7412         |
|             dpn107              | 32  | 113.8687 |  131.221  | 1725.4618 |        93.6008         |
|  swin_base_patch4_window7_224   | 64  | 147.1824 | 152.6723  | 1718.5087 |        91.0105         |
|            fbnetv3_b            | 128 | 115.7669 | 142.1773  | 1692.7135 |        82.7254         |
|        gluon_xception65         | 32  | 99.7325  | 117.5085  | 1681.6699 |        91.8841         |
|           tf_mixnet_l           | 128 | 194.1332 | 228.8948  | 1566.0183 |        159.4962        |
|        tnt_s_patch16_224        | 128 | 323.6449 | 323.6842  | 1536.4282 |        108.6532        |
|             dla102              | 128 | 172.2864 | 210.7307  | 1522.0561 |        113.0614        |
|        adv_inception_v3         | 128 | 160.5149 | 186.0702  | 1517.0954 |        105.4568        |
|       gluon_inception_v3        | 128 | 160.8192 |  185.207  | 1511.1704 |        105.5048        |
|          inception_v3           | 128 | 160.7894 | 184.9343  | 1507.8123 |        105.5518        |
|            mixnet_l             | 128 | 185.4266 | 220.3503  | 1502.7718 |        153.0848        |
|          jx_nest_base           | 32  | 101.9462 | 101.3201  | 1471.858  |        73.7274         |
|          ghostnet_100           | 128 | 90.6957  | 117.7099  | 1446.4298 |        55.5748         |
|     swsl_resnext101_32x16d      | 32  | 118.5712 | 140.9626  | 1415.4527 |        115.6034        |
|      xcit_large_24_p8_224       |  5  | 123.0223 | 145.0281  | 1401.1398 |        81.7532         |
|          convnext_base          | 64  | 124.1233 | 123.7934  | 1400.6163 |        82.9729         |
|           res2next50            | 128 | 125.8017 | 152.2267  | 1336.5811 |        92.4081         |
|            tinynet_a            | 128 |  73.684  | 102.4588  | 1298.8863 |        55.9749         |
|           volo_d1_224           | 64  | 121.0443 | 123.5959  | 1279.1936 |        72.2173         |
|         crossvit_9_240          | 128 | 82.5344  | 104.3107  | 1262.758  |        50.8075         |
|           rexnet_100            | 128 | 80.0365  | 108.2056  | 1249.7081 |        57.2737         |
|           dm_nfnet_f0           | 128 | 128.7652 | 128.4457  | 1207.1803 |        88.8357         |
|          cspdarknet53           | 64  | 94.9158  | 112.6445  | 1148.749  |        70.3837         |
|       tf_efficientnet_b0        | 128 | 84.5849  | 120.0191  | 1142.7374 |        58.7283         |
|            nfnet_l0             | 128 | 112.8469 | 137.1115  | 1108.9142 |        77.7381         |
|          spnasnet_100           | 128 | 70.3714  |  89.6667  | 1089.5708 |        46.8096         |
|      mobilenetv3_large_100      | 128 | 61.3133  |  76.4314  | 1037.1944 |         40.654         |
|           regnety_002           | 128 |  39.115  |  55.5547  | 999.2733  |         31.232         |
|           convit_base           | 64  | 163.1499 |  163.186  |  975.043  |        101.0913        |
|         coat_lite_mini          | 128 | 112.8613 | 113.0342  |  969.711  |        58.7408         |
|            gernet_l             | 128 | 77.7225  |  91.6391  | 967.7739  |        68.2404         |
|         mobilenetv2_100         | 128 | 65.4393  |  84.3156  | 955.8288  |        43.0293         |
|            repvgg_a2            | 128 | 77.4753  |  96.1314  | 948.9199  |        65.1539         |
|      beit_base_patch16_224      | 64  | 101.4184 | 106.5972  |  922.273  |        74.7343         |
|        convmixer_768_32         | 32  | 300.6358 | 310.7862  | 909.3266  |        299.4093        |
|       eca_botnext26ts_256       | 128 | 108.7003 | 147.1145  | 877.3971  |        74.3214         |
| deit_base_distilled_patch16_224 | 64  | 84.9031  |  84.9552  | 856.7664  |         67.455         |
|         visformer_small         | 128 |  91.184  |  96.0553  | 856.5408  |        77.9895         |
|          botnet26t_256          | 128 | 101.7925 | 116.3697  | 845.9948  |        69.6162         |
|      vit_base_patch16_224       | 64  | 86.7686  |  87.2608  | 842.8352  |        70.1167         |
|           selecsls42b           | 128 | 59.9943  |  73.7844  | 827.2056  |        42.4189         |
|            lcnet_050            | 128 | 31.8112  |  40.5714  | 767.6491  |        21.2126         |
|          gmlp_s16_224           | 128 | 137.6372 | 126.2206  | 663.0689  |        74.5258         |
|          gmixer_24_224          | 128 | 117.9209 | 132.1126  | 623.3988  |        67.1969         |
|            pit_b_224            | 64  | 118.7377 | 118.8683  | 507.6988  |         82.675         |
|          mixer_b16_224          | 128 | 116.4848 |  114.716  | 423.3028  |        85.5573         |
|          resmlp_12_224          | 128 | 53.4479  |  59.5818  | 387.1187  |        42.2709         |
+---------------------------------+-----+----------+-----------+-----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/torchbench_amp.png :

bench_logs/timm_models_amp.png :

Build Summary

see more

Run name

day_090_31_03_23_performance_amp_893

Commit hashes

pytorch commit: 5df59f9
pytorch commit date: 2023-04-01 01:43:33+00:00
torchbench commit: ea7b71ead75529529d67ffd17541b1f203c49b83
torchbench commit date: 2023-03-31 18:05:58-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git5df59f9

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 82%, 49/60 | 84%, 38/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 88%, 53/60 | 98%, 44/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.00x    |    1.53x    |    1.00x    |
| inductor_no_cudagraphs |   1.27x    |    1.50x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.73    |    7.42     |    5.90     |
|       aot_eager        |    9.32    |    16.02    |    13.09    |
|        inductor        |   58.75    |    63.43    |    99.51    |
| inductor_no_cudagraphs |   62.96    |    60.06    |   107.41    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.98x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.88x    |
|        inductor        |   0.95x    |    0.99x    |    1.02x    |
| inductor_no_cudagraphs |   0.94x    |    1.04x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Previous report name: /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 82%, 49/60  | 82%, 49/60  |
|        inductor        | huggingface | 84%, 38/45  | 84%, 38/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 88%, 53/60  | 88%, 53/60  |
| inductor_no_cudagraphs | huggingface | 98%, 44/45  | 98%, 44/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.00x    |   1.00x   |
|        inductor        | huggingface |   1.53x    |   1.53x   |
|        inductor        | timm_models |   1.00x    |   1.00x   |
| inductor_no_cudagraphs | torchbench  |   1.28x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.50x    |   1.50x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+----------------------------+------------------------+-----------------+
|    suite    |            name            | inductor_no_cudagraphs |    inductor     |
+-------------+----------------------------+------------------------+-----------------+
| torchbench  |            moco            |      fail_to_run       |   fail_to_run   |
| torchbench  |     Background_Matting     |    eager_variation     | eager_variation |
| torchbench  |      vision_maskrcnn       |    eager_variation     | eager_variation |
| torchbench  |         tacotron2          |         0.0000         |     0.0000      |
| torchbench  |            gat             |         0.0000         |     0.0000      |
| torchbench  |            gcn             |         0.0000         |     0.0000      |
| torchbench  |           llama            |         0.0000         |     0.0000      |
| torchbench  |            sage            |         0.0000         |     0.0000      |
| torchbench  |       torchrec_dlrm        |         0.0000         |     0.0000      |
| huggingface | AlbertForQuestionAnswering |     fail_accuracy      |  fail_accuracy  |
+-------------+----------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+-----------------------------------+------------------------+----------+
|    suite    |               name                | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------+------------------------+----------+
| torchbench  |               hf_T5               |         1.9812         |  0.1779  |
| torchbench  |             hf_Albert             |         2.3168         |  0.1587  |
| torchbench  |               vgg16               |         1.2546         |  0.1483  |
| torchbench  |        Background_Matting         |         1.208          |  0.1229  |
| torchbench  |            timm_nfnet             |         1.4761         |  0.1026  |
| torchbench  |           hf_Bert_large           |         1.5578         |  0.0986  |
| torchbench  |           hf_GPT2_large           |         1.7352         |  0.0967  |
| torchbench  |              hf_Bert              |         1.5658         |  0.0864  |
| torchbench  |              hf_Bart              |         1.5844         |  0.076   |
| torchbench  |           pytorch_unet            |         1.3516         |  0.0711  |
| torchbench  |            hf_T5_large            |         1.805          |  0.0692  |
| torchbench  |           BERT_pytorch            |         2.0591         |  0.062   |
| torchbench  |              yolov3               |         1.1997         |  0.0569  |
| torchbench  |           mobilenet_v2            |         1.496          |  0.0558  |
| torchbench  |              hf_GPT2              |         1.7879         |  0.0546  |
| torchbench  | attention_is_all_you_need_pytorch |         1.4759         |  0.0541  |
| torchbench  |           hf_DistilBert           |         1.5301         |  0.0476  |
| torchbench  |            timm_regnet            |         0.9655         |  0.0467  |
| torchbench  |              demucs               |         1.0391         |  0.041   |
| torchbench  |      timm_vision_transformer      |         1.3899         |  0.036   |
| torchbench  |        shufflenet_v2_x1_0         |         1.2097         |  0.0357  |
| torchbench  |             resnet50              |         1.0382         |  0.0352  |
| torchbench  |           timm_resnest            |         1.5137         |  0.0349  |
| torchbench  |             resnet152             |         1.014          |  0.034   |
| torchbench  |            densenet121            |         1.0496         |  0.0333  |
| torchbench  |          pytorch_stargan          |         1.2441         |  0.0319  |
| torchbench  |            timm_vovnet            |         0.9184         |  0.0316  |
| torchbench  |        mobilenet_v3_large         |         1.1816         |  0.0314  |
| torchbench  |         timm_efficientnet         |         1.0716         |  0.0311  |
| torchbench  |         phlippe_densenet          |         1.0122         |  0.0303  |
| torchbench  |   pytorch_CycleGAN_and_pix2pix    |         1.6978         |   0.03   |
| torchbench  |            mnasnet1_0             |         1.0609         |  0.0299  |
| torchbench  |          resnext50_32x4d          |         0.9652         |  0.0294  |
| torchbench  |      nvidia_deeprecommender       |         1.0188         |  0.027   |
| torchbench  |          pytorch_struct           |         1.1194         |  0.0256  |
| torchbench  |              alexnet              |         1.1352         |  0.0242  |
| torchbench  |           squeezenet1_1           |         1.3178         |  0.0239  |
| torchbench  |            tts_angular            |         0.9452         |  0.0212  |
| torchbench  |          phlippe_resnet           |         1.0007         |  0.0211  |
| torchbench  |       functorch_dp_cifar10        |         1.3585         |  0.0211  |
| torchbench  |             resnet18              |         0.9083         |  0.0203  |
| torchbench  |        speech_transformer         |         1.5709         |  0.0192  |
| torchbench  |           fastNLP_Bert            |         1.4923         |  0.0186  |
| torchbench  |          LearningToPaint          |         1.0624         |  0.0175  |
| torchbench  |               dcgan               |         0.8144         |  0.0085  |
| torchbench  |            hf_Reformer            |         1.066          |  0.0076  |
| torchbench  |           lennard_jones           |         0.8717         |  0.0068  |
| torchbench  |                drq                |         1.0116         |  0.0041  |
| torchbench  |         soft_actor_critic         |         0.8521         |  0.0032  |
| torchbench  |                gcn                |          0.0           |   0.0    |
| torchbench  |               sage                |          0.0           |   0.0    |
| torchbench  |                gat                |          0.0           |   0.0    |
| torchbench  |             tacotron2             |          0.0           |   0.0    |
| torchbench  |               dlrm                |         1.1859         |   0.0    |
| torchbench  |               moco                |          0.0           |   0.0    |
| torchbench  |            hf_BigBird             |         1.6351         |   0.0    |
| torchbench  |           hf_Longformer           |         1.3008         |   0.0    |
| torchbench  |   timm_vision_transformer_large   |         1.0816         |   0.0    |
| torchbench  |           torchrec_dlrm           |          0.0           |   0.0    |
| huggingface |         YituTechConvBert          |         1.4917         |  0.0281  |
| huggingface |        ElectraForCausalLM         |         1.8154         |  0.0264  |
| huggingface |  PegasusForConditionalGeneration  |         1.3327         |  0.0255  |
| huggingface |        PegasusForCausalLM         |         1.267          |  0.0236  |
| huggingface |      Speech2Text2ForCausalLM      |         1.5349         |  0.0234  |
| huggingface |          XGLMForCausalLM          |         1.5492         |  0.0207  |
| huggingface |  M2M100ForConditionalGeneration   |         1.4122         |  0.0171  |
| huggingface |        DebertaForMaskedLM         |         0.8107         |   0.0    |
| huggingface |   DebertaV2ForQuestionAnswering   |         0.6591         |   0.0    |
| huggingface |       BlenderbotForCausalLM       |         1.3262         |   0.0    |
| huggingface |    DebertaForQuestionAnswering    |         0.969          |   0.0    |
| huggingface |       AllenaiLongformerBase       |         1.559          |   0.0    |
| huggingface |       DebertaV2ForMaskedLM        |         0.6566         |   0.0    |
| timm_models |           mixer_b16_224           |         1.3586         |  0.2683  |
| timm_models |         convmixer_768_32          |         1.0018         |  0.2496  |
| timm_models |             pit_b_224             |         1.4284         |  0.2332  |
| timm_models |         tnt_s_patch16_224         |         2.9688         |  0.2173  |
| timm_models |           gmlp_s16_224            |         1.831          |  0.2093  |
| timm_models |           gmixer_24_224           |         1.749          |  0.1914  |
| timm_models |            convit_base            |         1.6124         |  0.1704  |
| timm_models |           resmlp_12_224           |         1.2573         |  0.1393  |
| timm_models |            tf_mixnet_l            |         1.1914         |  0.1252  |
| timm_models |        eca_botnext26ts_256        |         1.424          |  0.1214  |
| timm_models |           botnet26t_256           |         1.4231         |   0.12   |
| timm_models |             mixnet_l              |         1.1819         |  0.1199  |
| timm_models |              dla102               |         1.5232         |  0.1181  |
| timm_models |          coat_lite_mini           |         1.9185         |  0.1153  |
| timm_models |         adv_inception_v3          |         1.5213         |  0.1087  |
| timm_models |            dm_nfnet_f0            |         1.4283         |  0.108   |
| timm_models |        gluon_inception_v3         |         1.5215         |  0.1076  |
| timm_models |           inception_v3            |         1.5206         |  0.1058  |
| timm_models |          visformer_small          |         1.1654         |  0.1055  |
| timm_models |             nfnet_l0              |         1.4317         |  0.1033  |
| timm_models |       vit_base_patch16_224        |         1.2359         |  0.103   |
| timm_models |  deit_base_distilled_patch16_224  |         1.2554         |  0.1008  |
| timm_models |            volo_d1_224            |         1.6672         |  0.0966  |
| timm_models |            res2next50             |         1.3641         |  0.0954  |
| timm_models |       xcit_large_24_p8_224        |         1.583          |  0.0925  |
| timm_models |           convnext_base           |         1.4697         |  0.0905  |
| timm_models |   swin_base_patch4_window7_224    |         1.6063         |  0.0866  |
| timm_models |      swsl_resnext101_32x16d       |         1.0211         |  0.0845  |
| timm_models |       beit_base_patch16_224       |         1.3503         |  0.0815  |
| timm_models |           cspdarknet53            |         1.2612         |  0.0786  |
| timm_models |             repvgg_a2             |         1.1196         |  0.0781  |
| timm_models |          poolformer_m36           |         1.3186         |  0.0773  |
| timm_models |             gernet_l              |         1.0645         |  0.0754  |
| timm_models |            resnest101e            |         1.3532         |  0.0743  |
| timm_models |            selecsls42b            |         1.4108         |  0.0741  |
| timm_models |        tf_efficientnet_b0         |         1.3834         |  0.0725  |
| timm_models |             hrnet_w18             |         1.3522         |  0.0713  |
| timm_models |           pnasnet5large           |         1.1292         |  0.0707  |
| timm_models |           jx_nest_base            |         1.3587         |  0.0693  |
| timm_models |         res2net50_14w_8s          |         1.3612         |  0.0658  |
| timm_models |           cait_m36_384            |         1.3492         |  0.0654  |
| timm_models |          mobilenetv2_100          |         1.4458         |  0.065   |
| timm_models |             fbnetv3_b             |         1.3304         |  0.0631  |
| timm_models |              dpn107               |         1.1352         |  0.063   |
| timm_models |           ghostnet_100            |         1.6332         |  0.0627  |
| timm_models |           spnasnet_100            |         1.4201         |  0.0626  |
| timm_models |         twins_pcpvt_base          |         1.6596         |  0.062   |
| timm_models |            rexnet_100             |         1.3346         |  0.0617  |
| timm_models |         gluon_xception65          |         1.0791         |  0.0593  |
| timm_models |       mobilenetv3_large_100       |         1.4416         |  0.0575  |
| timm_models |             tinynet_a             |         1.2631         |  0.0551  |
| timm_models |          crossvit_9_240           |         1.6155         |   0.05   |
| timm_models |         ese_vovnet19b_dw          |         1.3737         |  0.0431  |
| timm_models |         res2net101_26w_4s         |         1.0871         |  0.0422  |
| timm_models |            regnety_002            |         1.2392         |  0.0416  |
| timm_models |         sebotnet33ts_256          |         1.5355         |  0.0402  |
| timm_models |             lcnet_050             |         1.4072         |   0.04   |
| timm_models |            fbnetc_100             |         1.3932         |  0.0371  |
| timm_models |            mobilevit_s            |         1.4424         |  0.0349  |
| timm_models |            mnasnet_100            |         1.4967         |  0.0332  |
+-------------+-----------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |          hf_T5_large           |        171.3756        | 183.4673 |
| torchbench  |        phlippe_densenet        |         167.24         | 130.3954 |
| torchbench  |          densenet121           |        142.3364        | 127.1376 |
| torchbench  |       timm_efficientnet        |        143.0212        | 121.1316 |
| torchbench  |       mobilenet_v3_large       |        135.373         | 113.9277 |
| torchbench  |          mobilenet_v2          |        126.4344        | 105.3012 |
| torchbench  |           hf_BigBird           |        123.7387        |   nan    |
| torchbench  | timm_vision_transformer_large  |        122.697         |   nan    |
| huggingface | M2M100ForConditionalGeneration |        137.1759        | 169.0099 |
| huggingface |        XGLMForCausalLM         |        134.6487        | 153.1724 |
| huggingface |     MobileBertForMaskedLM      |        148.0825        | 150.3332 |
| huggingface | MobileBertForQuestionAnswering |        135.7865        | 140.5556 |
| huggingface |  MT5ForConditionalGeneration   |        130.5399        | 127.3381 |
| timm_models |           hrnet_w18            |        244.1946        | 238.2311 |
| timm_models |           rexnet_100           |        287.994         | 228.688  |
| timm_models |          ghostnet_100          |        240.8781        | 202.2409 |
| timm_models |         pnasnet5large          |        160.1958        | 162.1515 |
| timm_models |          resnest101e           |        163.9545        | 153.3509 |
| timm_models |           fbnetv3_b            |        172.145         | 148.9306 |
| timm_models |          mobilevit_s           |        155.8761        | 144.6891 |
| timm_models |       res2net101_26w_4s        |        150.3364        | 141.7545 |
| timm_models |          tf_mixnet_l           |        156.9757        | 138.6625 |
| timm_models |        twins_pcpvt_base        |        143.3077        | 138.3742 |
| timm_models |           tinynet_a            |        159.8587        | 137.7565 |
| timm_models |            mixnet_l            |        156.4049        | 136.6916 |
| timm_models |        adv_inception_v3        |        155.2397        | 134.6815 |
| timm_models |      xcit_large_24_p8_224      |        127.496         | 133.2655 |
| timm_models |     mobilenetv3_large_100      |        159.3328        | 129.7193 |
| timm_models |       gluon_inception_v3       |        158.4097        | 129.714  |
| timm_models |          inception_v3          |        157.1155        | 129.5733 |
| timm_models |       tf_efficientnet_b0       |        149.9526        | 128.3209 |
| timm_models |        res2net50_14w_8s        |        122.2097        | 119.4182 |
| timm_models |           fbnetc_100           |        128.9753        | 116.6486 |
| timm_models |          spnasnet_100          |        134.0569        | 111.3162 |
| timm_models |        mobilenetv2_100         |        130.4401        |  98.26   |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.8951  |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.8934  |
| torchbench  |                resnet50                 |         0.8842         |  0.8909  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.889   |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8873  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8834  |
| torchbench  |           mobilenet_v3_large            |         0.872          |  0.8796  |
| torchbench  |           speech_transformer            |         0.869          |  0.8694  |
| torchbench  |               densenet121               |         0.8056         |  0.8268  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.8064  |
| torchbench  |               mnasnet1_0                |         0.7755         |  0.7837  |
| torchbench  |             resnext50_32x4d             |         0.772          |  0.7798  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.7552  |
| torchbench  |             pytorch_struct              |         0.7362         |  0.7428  |
| torchbench  |                resnet18                 |         0.6097         |  0.619   |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6004         |  0.6035  |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.451   |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3554  |
| huggingface |           ElectraForCausalLM            |         0.8941         |  0.8953  |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8872  |
| huggingface |            TrOCRForCausalLM             |         0.9583         |  0.8855  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8749  |
| huggingface |       BlenderbotSmallForCausalLM        |         0.9119         |  0.8215  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.8112  |
| huggingface |         Speech2Text2ForCausalLM         |         0.8095         |  0.8111  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6659  |
| huggingface |          AllenaiLongformerBase          |         0.8742         |   nan    |
| timm_models |               regnety_002               |         0.8966         |  0.901   |
| timm_models |                lcnet_050                |         0.884          |  0.8898  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/passrate_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/memory_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Performance speedup regressions

+------------------------+-------------+-------------+------------+
|        compiler        |    name     | prev_status | cur_status |
+------------------------+-------------+-------------+------------+
| inductor_no_cudagraphs | tts_angular |   0.9604    |   0.9452   |
| inductor_no_cudagraphs |  resnet18   |   0.9533    |   0.9083   |
+------------------------+-------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 0.9973 |  0.1792   |  1.2326  |         1.233          |
|               hf_T5               |  8   | 0.9854 |  0.8501   |  0.1779  |         1.9812         |
|             hf_Albert             |  8   | 0.9956 |   0.961   |  0.1587  |         2.3168         |
|               vgg16               |  64  | 0.9993 |  0.9984   |  0.1483  |         1.2546         |
|        Background_Matting         |  4   | 0.9986 |  0.1368   |  0.1229  |         1.208          |
|            timm_nfnet             | 128  | 0.986  |  0.9844   |  0.1026  |         1.4761         |
|           hf_Bert_large           |  4   | 0.9941 |  0.8817   |  0.0986  |         1.5578         |
|           hf_GPT2_large           |  4   | 0.983  |  0.9718   |  0.0967  |         1.7352         |
|              hf_Bert              |  4   | 0.9943 |   0.841   |  0.0864  |         1.5658         |
|              hf_Bart              |  4   | 0.969  |  0.8452   |  0.076   |         1.5844         |
|           pytorch_unet            |  1   | 0.9974 |  0.2051   |  0.0711  |         1.3516         |
|            hf_T5_large            |  2   | 0.9765 |  0.8093   |  0.0692  |         1.805          |
|           BERT_pytorch            |  16  | 0.9908 |  0.8003   |  0.062   |         2.0591         |
|              yolov3               |  16  | 0.9968 |  0.8047   |  0.0569  |         1.1997         |
|           mobilenet_v2            |  96  | 0.997  |  0.7774   |  0.0558  |         1.496          |
|              hf_GPT2              |  4   | 0.9946 |  0.9059   |  0.0546  |         1.7879         |
| attention_is_all_you_need_pytorch | 256  | 0.9873 |  0.8292   |  0.0541  |         1.4759         |
|           hf_DistilBert           |  8   | 0.9954 |  0.9384   |  0.0476  |         1.5301         |
|            timm_regnet            |  32  | 0.9195 |  0.7708   |  0.0467  |         0.9655         |
|              demucs               |  4   | 0.9996 |  1.0002   |  0.041   |         1.0391         |
|      timm_vision_transformer      |  32  | 0.9834 |  0.8499   |  0.036   |         1.3899         |
|        shufflenet_v2_x1_0         | 128  | 0.9957 |  0.7491   |  0.0357  |         1.2097         |
|             resnet50              |  32  | 0.9952 |  0.7639   |  0.0352  |         1.0382         |
|           timm_resnest            |  32  | 0.9923 |  0.8502   |  0.0349  |         1.5137         |
|             resnet152             |  32  | 0.9961 |  0.7502   |  0.034   |         1.014          |
|            densenet121            |  4   | 0.987  |  0.7047   |  0.0333  |         1.0496         |
|          pytorch_stargan          |  16  | 0.9941 |  0.8042   |  0.0319  |         1.2441         |
|            timm_vovnet            |  32  | 0.8777 |  0.7099   |  0.0316  |         0.9184         |
|        mobilenet_v3_large         |  32  | 0.9953 |  0.7856   |  0.0314  |         1.1816         |
|         timm_efficientnet         |  32  | 0.9385 |   0.625   |  0.0311  |         1.0716         |
|         phlippe_densenet          | 128  | 0.984  |  0.7681   |  0.0303  |         1.0122         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9726 |  0.9006   |   0.03   |         1.6978         |
|            mnasnet1_0             |  32  | 0.988  |  0.7347   |  0.0299  |         1.0609         |
|          resnext50_32x4d          |  8   | 0.9836 |  0.7244   |  0.0294  |         0.9652         |
|      nvidia_deeprecommender       | 256  | 0.9985 |  0.9988   |  0.027   |         1.0188         |
|          pytorch_struct           | 200  | 0.9283 |   0.764   |  0.0256  |         1.1194         |
|              alexnet              | 128  | 0.9987 |  0.9972   |  0.0242  |         1.1352         |
|           squeezenet1_1           |  32  | 0.9863 |  0.9294   |  0.0239  |         1.3178         |
|            tts_angular            |  64  | 0.9125 |  0.8812   |  0.0212  |         0.9452         |
|          phlippe_resnet           | 128  | 0.9849 |  0.7579   |  0.0211  |         1.0007         |
|       functorch_dp_cifar10        |  64  | 0.9653 |  0.9109   |  0.0211  |         1.3585         |
|             resnet18              |  16  | 0.982  |  0.7487   |  0.0203  |         0.9083         |
|        speech_transformer         |  32  | 0.9758 |  0.7862   |  0.0192  |         1.5709         |
|           fastNLP_Bert            |  6   | 0.9802 |  0.8641   |  0.0186  |         1.4923         |
|          LearningToPaint          |  96  | 0.9869 |  0.7682   |  0.0175  |         1.0624         |
|               dcgan               |  32  | 0.8687 |  0.6983   |  0.0085  |         0.8144         |
|            hf_Reformer            |  4   | 0.9861 |  0.9675   |  0.0076  |         1.066          |
|           lennard_jones           | 1000 | 0.8185 |  0.7319   |  0.0068  |         0.8717         |
|                drq                |  1   | 0.9545 |  0.7145   |  0.0041  |         1.0116         |
|         soft_actor_critic         | 256  | 0.863  |  0.6148   |  0.0032  |         0.8521         |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9437 |  0.8502   |   0.0    |         1.1859         |
|               moco                |  32  | 0.9774 |    0.0    |   0.0    |          0.0           |
|            hf_BigBird             |  2   | 0.9515 |  0.7728   |   0.0    |         1.6351         |
|           hf_Longformer           |  2   | 0.8291 |  0.5687   |   0.0    |         1.3008         |
|   timm_vision_transformer_large   |  32  | 0.9979 |    0.0    |   0.0    |         1.0816         |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 27.6431 |  56.4887  | 183.4673 |        171.3756        |
|         phlippe_densenet          | 128  | 3.4263  |  6.9505   | 130.3954 |         167.24         |
|            densenet121            |  4   | 7.5612  |  19.0461  | 127.1376 |        142.3364        |
|         timm_efficientnet         |  32  | 4.9551  |  10.0881  | 121.1316 |        143.0212        |
|        mobilenet_v3_large         |  32  | 3.4397  |  7.5948   | 113.9277 |        135.373         |
|           hf_GPT2_large           |  4   | 14.9911 |  29.941   | 113.1851 |        103.8969        |
|           mobilenet_v2            |  96  | 3.0599  |  6.9548   | 105.3012 |        126.4344        |
|             resnet152             |  32  | 9.0329  |  20.2116  | 105.2063 |        104.9415        |
|              yolov3               |  16  | 4.8906  |  10.6748  | 102.2167 |        116.0795        |
|            hf_Reformer            |  4   | 4.1487  |  6.0027   |  88.371  |        40.5291         |
|        speech_transformer         |  32  | 5.9369  |  13.6877  | 87.8976  |        76.2481         |
|            mnasnet1_0             |  32  | 3.1044  |  6.7687   | 86.8717  |        107.0586        |
|           timm_resnest            |  32  | 1.8247  |  3.8907   | 81.1211  |        97.9881         |
| attention_is_all_you_need_pytorch | 256  |  4.397  |  10.9396  | 76.6283  |        75.8665         |
|        shufflenet_v2_x1_0         | 128  | 3.6363  |  7.6854   | 74.7009  |        81.2662         |
|           BERT_pytorch            |  16  |  4.84   |  11.4571  | 70.2811  |        67.4358         |
|            timm_nfnet             | 128  | 5.7702  |  11.1385  |  69.148  |         70.451         |
|            timm_regnet            |  32  | 6.6941  |  12.1786  | 67.4013  |        69.3515         |
|           hf_Bert_large           |  4   | 10.2611 |  20.9043  | 66.8397  |        62.4013         |
|        Background_Matting         |  4   | 3.1492  |  11.1875  | 61.8466  |        67.6475         |
|           fastNLP_Bert            |  6   | 5.1032  |  11.1574  | 59.5171  |        46.3027         |
|             resnet50              |  32  | 3.2116  |  7.4353   | 59.2194  |        61.4624         |
|            timm_vovnet            |  32  | 3.7321  |  6.3316   | 56.1578  |        59.5766         |
|               hf_T5               |  8   | 5.9601  |  13.4148  | 51.9754  |        48.4069         |
|              hf_Bart              |  4   | 6.1724  |  13.6501  | 51.7483  |        47.1627         |
|      timm_vision_transformer      |  32  | 3.2669  |  7.7161   | 50.3592  |        47.7044         |
|           pytorch_unet            |  1   | 1.5003  |   4.409   | 49.1817  |         60.786         |
|          resnext50_32x4d          |  8   | 3.2323  |  7.0058   |  48.919  |        50.0044         |
|       functorch_dp_cifar10        |  64  | 1.2131  |  2.3838   | 44.7714  |        53.7024         |
|              hf_GPT2              |  4   | 4.6344  |  9.6608   | 43.6806  |        40.1272         |
|             hf_Albert             |  8   | 2.4769  |  8.5056   | 41.3303  |        40.3752         |
|          LearningToPaint          |  96  | 1.4007  |  2.8773   | 40.6106  |        43.4587         |
|          pytorch_stargan          |  16  | 1.1986  |  3.2075   | 40.3839  |         45.499         |
|            Super_SloMo            |  6   |  2.738  |  9.6776   | 39.0187  |        42.8439         |
|              hf_Bert              |  4   | 5.0091  |  10.3753  | 38.6915  |        36.8886         |
|             resnet18              |  16  | 1.3436  |  2.8575   | 37.1149  |         42.336         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2119  |  2.9475   | 32.2943  |        35.5337         |
|           hf_DistilBert           |  8   | 2.4885  |  5.5521   | 31.6721  |        30.4211         |
|              demucs               |  4   | 1.4288  |  2.1678   | 30.8929  |        27.9929         |
|          phlippe_resnet           | 128  |  1.38   |  2.8349   | 28.5867  |        32.2553         |
|           squeezenet1_1           |  32  | 1.0388  |  1.7219   | 22.3647  |         21.554         |
|          pytorch_struct           | 200  | 0.7857  |  1.3324   | 20.7556  |        20.3748         |
|                drq                |  1   |  0.673  |  1.0135   |  16.079  |         8.7184         |
|               vgg16               |  64  | 0.6276  |  1.1106   | 15.3166  |        14.2171         |
|              alexnet              | 128  | 0.4854  |  0.7695   | 14.4422  |        13.5657         |
|      nvidia_deeprecommender       | 256  | 0.4842  |  0.7648   |  10.27   |         9.3884         |
|         soft_actor_critic         | 256  | 0.4204  |  0.6089   |  9.7252  |         6.4031         |
|               dcgan               |  32  | 0.4445  |  0.7125   |  8.9012  |         7.338          |
|           lennard_jones           | 1000 | 0.3931  |  0.5962   |  6.8661  |         6.159          |
|            tts_angular            |  64  | 0.4594  |  0.5133   |  6.6203  |         5.0601         |
|            hf_BigBird             |  2   | 12.7898 |  36.8641  |   nan    |        123.7387        |
|   timm_vision_transformer_large   |  32  | 9.3699  |    nan    |   nan    |        122.697         |
|           hf_Longformer           |  2   | 11.3083 |  31.1057  |   nan    |        118.9154        |
|               dlrm                | 1024 |  0.371  |  0.7697   |   nan    |         6.8774         |
|               moco                |  32  | 27.6999 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.2585  |         1.2557         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.2078  |         1.2082         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  1.193   |         1.1717         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.1751  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.1728  |         1.1719         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  1.1687  |         1.168          |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  1.1296  |         1.1266         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  1.1278  |         1.128          |
|           mobilenet_v2            |  96  | 0.9863 |  0.7656   |  1.1083  |         1.102          |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  1.1053  |         0.9973         |
|            timm_nfnet             | 128  | 0.9071 |   0.875   |  1.0767  |         1.0728         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  1.0737  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  1.0736  |         1.0713         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  1.0687  |         0.9997         |
|                drq                |  1   | 0.9877 |  0.8852   |  1.0607  |         0.9573         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  1.0421  |         1.0406         |
|              yolov3               |  16  | 0.984  |  0.8254   |  1.0373  |         1.0367         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  1.0344  |         1.0258         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  1.0198  |         0.9983         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  1.0011  |         0.9945         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.9823  |         0.9808         |
|        shufflenet_v2_x1_0         | 128  | 0.955  |  0.8384   |  0.9737  |         0.9646         |
|           timm_resnest            |  32  | 0.9888 |  0.8825   |  0.9714  |         0.966          |
|              demucs               |  4   | 0.9661 |  0.9657   |  0.9674  |         0.9656         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.9645  |         0.9645         |
|            timm_regnet            |  32  | 0.9959 |  0.8499   |  0.955   |         0.9496         |
|         timm_efficientnet         |  32  | 0.9859 |  0.7664   |  0.9475  |         0.9424         |
|             resnet152             |  32  | 0.9939 |  0.8936   |  0.9458  |         0.9417         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.9434  |         0.939          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.9306  |         0.9308         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.9236  |         0.9173         |
|           squeezenet1_1           |  32  | 0.9695 |  0.9291   |   0.91   |         0.908          |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.8951  |         0.8931         |
|          pytorch_stargan          |  16  | 0.9914 |  0.9749   |  0.8934  |         0.8893         |
|             resnet50              |  32  | 0.9914 |  0.8623   |  0.8909  |         0.8842         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.889   |         0.8869         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8873  |         0.8835         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8834  |         0.8659         |
|        mobilenet_v3_large         |  32  | 0.9784 |  0.9444   |  0.8796  |         0.872          |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8694  |         0.869          |
|            densenet121            |  4   | 0.9944 |  0.9783   |  0.8268  |         0.8056         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.8064  |         0.8022         |
|            mnasnet1_0             |  32  | 0.9792 |  0.8971   |  0.7837  |         0.7755         |
|          resnext50_32x4d          |  8   | 0.9962 |  0.8441   |  0.7798  |         0.772          |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.7552  |         0.7463         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7428  |         0.7362         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.619   |         0.6097         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8796   |  0.6035  |         0.6004         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.451   |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3554  |         0.3395         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |   nan    |         1.1191         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |         1.0009         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.8565 |  0.8295   |   nan    |         0.9046         |
|               moco                |  32  | 0.9905 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+------------+------------------------+
|               name                |  bs  |  eager   | aot_eager |  inductor  | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+------------+------------------------+
|            hf_Reformer            |  4   |  82.033  |  83.5654  | 10757.1184 |        75.8744         |
|            hf_T5_large            |  2   | 230.1768 | 275.9646  | 3314.3923  |        121.1177        |
|           fastNLP_Bert            |  6   | 56.5011  |  59.5525  | 3171.4975  |        34.5018         |
|        speech_transformer         |  32  | 66.7818  |  82.2457  | 3099.7429  |         35.549         |
|           hf_GPT2_large           |  4   | 212.5433 | 214.6632  | 2168.5523  |        120.7361        |
|             resnet152             |  32  | 65.5755  |  86.7843  | 1934.7344  |        63.2599         |
|            densenet121            |  4   | 60.2519  |  79.4991  | 1686.9063  |        49.6232         |
|              demucs               |  4   | 53.6871  |  53.4677  | 1306.2579  |         51.612         |
|                drq                |  1   |  3.5022  |  4.8322   | 1293.2706  |         3.4701         |
|            timm_regnet            |  32  | 61.3309  |  72.1958  | 1233.6923  |        58.2151         |
|              yolov3               |  16  | 68.7523  |  85.1759  | 1211.8759  |         57.091         |
|              hf_Bart              |  4   | 77.4462  |  80.3035  | 1202.0055  |         45.41          |
|            timm_nfnet             | 128  | 119.9563 | 119.8469  | 1154.8237  |        80.1938         |
|         timm_efficientnet         |  32  | 33.9831  |  50.3889  | 1109.5464  |        29.6869         |
| attention_is_all_you_need_pytorch | 256  | 56.9855  |  67.2135  | 1031.0079  |        36.3661         |
|        Background_Matting         |  4   | 126.0189 | 918.3325  | 1026.0737  |        103.9936        |
|               hf_T5               |  8   | 183.8846 | 211.4041  | 1013.9757  |        90.5127         |
|        mobilenet_v3_large         |  32  | 28.3233  |  36.0375  |  909.3663  |        22.6827         |
|              hf_GPT2              |  4   | 48.8149  |  53.2889  |  908.9149  |        26.9819         |
|        shufflenet_v2_x1_0         | 128  | 32.2576  |  41.9083  |  901.1439  |        24.7481         |
|           BERT_pytorch            |  16  | 52.8691  |  76.7493  |  898.061   |        27.4665         |
|           hf_Bert_large           |  4   | 82.5686  |  91.8787  |  850.7397  |        52.7252         |
|           mobilenet_v2            |  96  | 47.0304  |  60.3208  |  846.8602  |        31.3814         |
|            mnasnet1_0             |  32  | 21.9624  |  31.2601  |  844.119   |        20.9146         |
|             resnet50              |  32  | 26.1352  |  36.4535  |  842.2688  |        26.9691         |
|      timm_vision_transformer      |  32  | 32.2575  |  35.8435  |   836.53   |         20.325         |
|            timm_vovnet            |  32  | 29.6801  |  34.6155  |  835.3107  |        26.7589         |
|         phlippe_densenet          | 128  | 25.5408  |  29.9122  |  834.2194  |        22.9459         |
|          resnext50_32x4d          |  8   | 21.8975  |  26.7859  |  828.9801  |        22.4516         |
|          LearningToPaint          |  96  | 11.1636  |  15.0221  |  774.9529  |        10.4013         |
|         soft_actor_critic         | 256  |  1.894   |   2.382   |  723.5197  |         2.0938         |
|           timm_resnest            |  32  | 24.2934  |  28.2762  |  696.9381  |         15.909         |
|           hf_DistilBert           |  8   | 32.5746  |  35.2494  |  675.4461  |        21.4399         |
|           pytorch_unet            |  1   | 39.9153  | 194.1264  |  563.2477  |         29.459         |
|       functorch_dp_cifar10        |  64  | 10.4472  |  11.0396  |  556.812   |         7.5401         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.2972  |  14.7414  |  552.949   |         9.7956         |
|             resnet18              |  16  |  9.6391  |  12.8064  |  535.3029  |        11.8479         |
|              hf_Bert              |  4   | 40.3628  |  46.9116  |  498.2329  |        25.5843         |
|          pytorch_stargan          |  16  | 14.6277  |  18.1057  |  483.398   |        12.1384         |
|          phlippe_resnet           | 128  |  9.1459  |  11.8035  |  481.9061  |         8.9725         |
|           squeezenet1_1           |  32  | 10.1009  |  10.5484  |  468.5779  |         7.6079         |
|               vgg16               |  64  | 66.2619  |  66.2945  |  448.2474  |        52.7864         |
|             hf_Albert             |  8   | 68.5737  |  72.4234  |  442.1943  |         29.999         |
|              alexnet              | 128  |  9.8371  |  9.8572   |  409.8316  |         8.6518         |
|      nvidia_deeprecommender       | 256  | 10.2123  |  10.2187  |  380.1248  |         10.023         |
|               dcgan               |  32  |  2.3978  |  2.9884   |  372.5933  |         2.5968         |
|           lennard_jones           | 1000 |  1.8115  |  2.0916   |  324.041   |         1.7616         |
|            tts_angular            |  64  |  7.3983  |  7.0498   |  319.286   |         6.6067         |
|          pytorch_struct           | 200  |  5.803   |  6.0241   |  228.1943  |         4.1425         |
|            Super_SloMo            |  6   | 79.7084  | 443.3061  |  64.3649   |        64.4248         |
|   timm_vision_transformer_large   |  32  | 464.4802 |    nan    |    nan     |        428.3336        |
|            hf_BigBird             |  2   | 194.7891 | 241.1474  |    nan     |        115.1508        |
|           hf_Longformer           |  2   | 135.6928 | 195.7087  |    nan     |        87.4267         |
|               dlrm                | 1024 |  4.8983  |  4.7694   |    nan     |         3.9493         |
|               moco                |  32  | 51.7932  |    nan    |    nan     |          nan           |
|                gat                |  0   |   nan    |    nan    |    nan     |          nan           |
|                gcn                |  0   |   nan    |    nan    |    nan     |          nan           |
|               sage                |  0   |   nan    |    nan    |    nan     |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |    nan     |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |    nan     |          nan           |
+-----------------------------------+------+----------+-----------+------------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.988  |  0.9142   |  2.4617  |         2.4732         |
|      GPT2ForSequenceClassification      |  4  | 0.9772 |  0.9514   |  2.2954  |         2.2826         |
|          MobileBertForMaskedLM          | 64  | 0.949  |  0.8072   |  2.2514  |         1.0776         |
|       MT5ForConditionalGeneration       | 16  | 0.9879 |  0.8428   |  2.1801  |         1.8888         |
|     MobileBertForQuestionAnswering      | 128 | 0.9481 |  0.8068   |  2.118   |         1.0849         |
|       ElectraForQuestionAnswering       | 64  | 0.988  |  0.9762   |  2.101   |         2.0901         |
|    LayoutLMForSequenceClassification    | 16  | 0.9842 |   0.971   |  1.8214  |         1.788          |
|            XLNetLMHeadModel             |  8  | 0.9959 |  0.9661   |  1.8125  |         1.8169         |
|       RobertaForQuestionAnswering       | 16  | 0.9845 |  0.9693   |  1.7711  |         1.7678         |
|        BertForQuestionAnswering         | 16  | 0.9839 |   0.969   |  1.7636  |         1.7619         |
|       T5ForConditionalGeneration        |  4  | 0.9805 |  0.8473   |  1.7461  |         1.7278         |
|                 T5Small                 |  4  | 0.9801 |  0.8477   |  1.7402  |         1.7282         |
|               DistillGPT2               | 16  | 0.9878 |  0.9555   |  1.6657  |         1.6987         |
|           RobertaForCausalLM            | 16  | 0.9868 |  0.9621   |  1.6625  |         1.6669         |
|         MegatronBertForCausalLM         |  4  | 0.9781 |   0.905   |  1.6598  |         1.4965         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9805 |  0.9603   |  1.6521  |         1.6261         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.8854   |  1.6394  |         1.6429         |
|            PLBartForCausalLM            |  8  | 0.9886 |  0.9574   |  1.6386  |         1.6899         |
|     PLBartForConditionalGeneration      |  4  | 0.9879 |   0.942   |  1.6338  |         1.6492         |
|            AlbertForMaskedLM            |  4  | 0.9999 |  0.8849   |  1.6238  |         1.6243         |
|           LayoutLMForMaskedLM           | 16  | 0.9863 |  0.9621   |  1.5909  |         1.5838         |
|             BertForMaskedLM             | 16  | 0.9858 |  0.9599   |  1.585   |         1.5933         |
|             BartForCausalLM             |  4  | 0.9851 |  0.9567   |  1.5395  |         1.5441         |
|                CamemBert                | 16  | 0.9879 |  0.9627   |  1.5376  |         1.5356         |
|      MBartForConditionalGeneration      |  2  | 0.9968 |  0.9605   |  1.5184  |         1.4086         |
|            MBartForCausalLM             |  4  | 0.9852 |  0.9548   |  1.5162  |         1.5411         |
|      BartForConditionalGeneration       |  2  | 0.9988 |  0.9627   |  1.4986  |         1.4804         |
|     DistilBertForQuestionAnswering      | 256 | 0.9937 |  0.9872   |  1.4586  |         1.4456         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9981 |  0.9196   |  1.4328  |         1.3848         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9819 |  0.9124   |  1.2921  |         1.2596         |
|            TrOCRForCausalLM             | 32  | 0.9879 |  0.9479   |  1.2661  |         1.2823         |
|          DistilBertForMaskedLM          | 128 | 0.9919 |  0.9509   |  1.215   |         1.2334         |
|            YituTechConvBert             | 16  | 0.9857 |  0.9551   |  0.0281  |         1.4917         |
|           ElectraForCausalLM            | 32  | 0.9822 |  0.9349   |  0.0264  |         1.8154         |
|     PegasusForConditionalGeneration     | 32  | 0.996  |  0.9201   |  0.0255  |         1.3327         |
|           PegasusForCausalLM            | 32  | 0.981  |  0.9352   |  0.0236  |         1.267          |
|         Speech2Text2ForCausalLM         | 256 | 0.9752 |  0.9284   |  0.0234  |         1.5349         |
|             XGLMForCausalLM             |  8  | 0.9881 |  0.7941   |  0.0207  |         1.5492         |
|     M2M100ForConditionalGeneration      | 16  | 1.0235 |  0.8385   |  0.0171  |         1.4122         |
|           DebertaForMaskedLM            |  4  | 0.7117 |  0.5631   |   0.0    |         0.8107         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6868 |  0.5264   |   0.0    |         0.6591         |
|          BlenderbotForCausalLM          |  4  | 0.9851 |  0.8596   |   0.0    |         1.3262         |
|       DebertaForQuestionAnswering       |  8  | 0.8051 |  0.6823   |   0.0    |         0.969          |
|          AllenaiLongformerBase          |  4  | 0.8819 |  0.6269   |   0.0    |         1.559          |
|          DebertaV2ForMaskedLM           |  1  | 0.6792 |   0.519   |   0.0    |         0.6566         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|     M2M100ForConditionalGeneration      | 16  | 11.906  |  25.8722  | 169.0099 |        137.1759        |
|             XGLMForCausalLM             |  8  |  9.484  |  21.5805  | 153.1724 |        134.6487        |
|          MobileBertForMaskedLM          | 64  | 17.0707 |  40.4607  | 150.3332 |        148.0825        |
|     MobileBertForQuestionAnswering      | 128 | 17.126  |  40.4394  | 140.5556 |        135.7865        |
|       MT5ForConditionalGeneration       | 16  | 8.0576  |  18.8087  | 127.3381 |        130.5399        |
|     PegasusForConditionalGeneration     | 32  | 5.2149  |  19.4038  | 105.0273 |          71.4          |
|            YituTechConvBert             | 16  | 11.1164 |  19.5363  | 92.7042  |        73.7482         |
|            XLNetLMHeadModel             |  8  | 10.6549 |  27.7313  | 92.2195  |        90.9108         |
|      MBartForConditionalGeneration      |  2  | 11.5957 |  26.4585  | 81.2937  |        78.1247         |
|      BartForConditionalGeneration       |  2  | 11.7397 |  26.0486  | 78.7363  |        72.8714         |
|           ElectraForCausalLM            | 32  | 7.7095  |  13.628   | 76.4408  |        63.7372         |
|         MegatronBertForCausalLM         |  4  | 10.2556 |  21.5068  | 69.2777  |         65.934         |
|    MegatronBertForQuestionAnswering     |  8  | 10.2318 |  21.571   | 69.1353  |        66.9825         |
|           PegasusForCausalLM            | 32  | 5.8065  |  11.3396  | 58.0885  |        41.0024         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.5494  |  17.0981  | 55.2928  |        52.9736         |
|                 T5Small                 |  4  | 5.8093  |  12.9541  | 50.8883  |        48.0554         |
|       T5ForConditionalGeneration        |  4  | 5.7628  |  12.8339  |  50.305  |        48.3493         |
|     PLBartForConditionalGeneration      |  4  | 6.1755  |  13.566   | 48.9287  |        45.6913         |
|    LayoutLMForSequenceClassification    | 16  | 5.4622  |  11.2048  | 46.4164  |        44.9429         |
|       ElectraForQuestionAnswering       | 64  | 5.2645  |  11.0122  | 44.3736  |        43.0087         |
|            MBartForCausalLM             |  4  |  5.682  |  11.3014  | 41.0035  |        39.8847         |
|             BartForCausalLM             |  4  | 5.6838  |  11.0836  | 40.6328  |        36.9003         |
|           LayoutLMForMaskedLM           | 16  | 5.5567  |  11.3585  | 40.5977  |        39.1407         |
|         Speech2Text2ForCausalLM         | 256 | 3.3592  |  5.9986   | 39.9968  |        29.4694         |
|             BertForMaskedLM             | 16  | 5.2501  |  10.994   | 39.9454  |        39.2155         |
|        BertForQuestionAnswering         | 16  | 5.1898  |  10.9653  | 39.2547  |        37.3895         |
|             OPTForCausalLM              |  2  | 4.7289  |  10.4911  | 37.8108  |        35.9957         |
|            TrOCRForCausalLM             | 32  |  5.818  |  11.0759  | 37.5176  |        34.9944         |
|                CamemBert                | 16  | 5.2203  |  10.9141  | 37.3051  |        36.5362         |
|           RobertaForCausalLM            | 16  | 5.4302  |  10.9023  | 37.2851  |        37.3622         |
|            AlbertForMaskedLM            |  4  | 2.2677  |  8.2173   | 36.9639  |        36.9178         |
|       RobertaForQuestionAnswering       | 16  | 5.4436  |  10.8602  | 36.2036  |        34.6485         |
|      GPT2ForSequenceClassification      |  4  | 4.8135  |  10.061   | 35.8539  |        34.7445         |
|     DistilBertForQuestionAnswering      | 256 | 2.5106  |  5.2682   | 34.3256  |        34.8449         |
|       AlbertForQuestionAnswering        |  4  | 2.3493  |  8.0857   | 34.1305  |        33.4052         |
|          DistilBertForMaskedLM          | 128 | 2.5378  |  5.4452   | 33.9339  |        33.3091         |
|       BlenderbotSmallForCausalLM        | 64  | 3.7606  |  7.4749   | 28.4517  |         28.254         |
|               DistillGPT2               | 16  | 2.5475  |  5.1487   | 27.8654  |        25.8153         |
|            PLBartForCausalLM            |  8  | 3.1121  |  5.9286   | 25.7826  |        23.9795         |
|          AllenaiLongformerBase          |  4  | 11.5785 |  32.5477  |   nan    |        117.8472        |
|          DebertaV2ForMaskedLM           |  1  | 15.5613 |  27.4751  |   nan    |        69.7284         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.3847 |   26.91   |   nan    |        68.3554         |
|          BlenderbotForCausalLM          |  4  | 11.2176 |  21.9637  |   nan    |         68.178         |
|           DebertaForMaskedLM            |  4  |  7.398  |  13.8488  |   nan    |        52.5569         |
|       DebertaForQuestionAnswering       |  8  | 7.3819  |  13.7427  |   nan    |        52.4203         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  1.3156  |         1.3147         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  1.2697  |         1.268          |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1962  |         1.195          |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.1782  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.1778  |         1.1724         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1562  |         1.2307         |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.0965  |         1.1346         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0902  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0902  |         1.1813         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0897  |         1.1368         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0605  |         1.1479         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0562  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.056   |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0532  |         1.0491         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  1.044   |         1.1152         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0104  |         1.0518         |
|     PegasusForConditionalGeneration     | 32  | 0.945  |  0.8957   |  1.0086  |         1.0074         |
|            YituTechConvBert             | 16  | 0.953  |  0.8732   |  0.9922  |         0.9905         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9772  |         1.052          |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.971   |         1.0642         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.9653  |         1.0962         |
|     M2M100ForConditionalGeneration      | 16  | 0.9551 |  0.8773   |  0.9621  |         0.9607         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9444  |         0.9912         |
|           PegasusForCausalLM            | 32  | 0.9259 |  0.8407   |  0.9387  |         0.9368         |
|             XGLMForCausalLM             |  8  | 0.9432 |  0.8613   |  0.9344  |         0.933          |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9294  |         0.9749         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.9273  |         1.0307         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9162  |         0.9886         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.9136  |         1.0139         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9127  |         1.0018         |
|           ElectraForCausalLM            | 32  | 0.9161 |   0.786   |  0.8953  |         0.8941         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8872  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8855  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8749  |         0.9803         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8215  |         0.9119         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.8112  |         1.016          |
|         Speech2Text2ForCausalLM         | 256 | 0.8885 |  0.7587   |  0.8111  |         0.8095         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6659  |         0.8392         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |   nan    |         1.1527         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9156   |   nan    |         0.9978         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9763   |   nan    |         0.9802         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |   nan    |         0.9665         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |   nan    |         0.8742         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+-----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor  | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+-----------+------------------------+
|     M2M100ForConditionalGeneration      | 16  | 117.0955 | 135.8618  | 8770.0272 |        76.7032         |
|     PegasusForConditionalGeneration     | 32  | 156.0647 | 154.7232  | 6829.1368 |        107.8035        |
|             XGLMForCausalLM             |  8  | 102.0223 | 112.2916  | 5862.8669 |        74.1925         |
|            YituTechConvBert             | 16  | 127.6733 | 130.8609  | 4488.0309 |        83.9742         |
|           PegasusForCausalLM            | 32  | 75.3234  |  73.7359  | 3472.4636 |        63.1034         |
|           ElectraForCausalLM            | 32  | 89.6262  |  93.9682  | 3380.7617 |        48.4949         |
|         Speech2Text2ForCausalLM         | 256 | 54.2568  |  56.0077  | 2338.7069 |        34.9506         |
|            AlbertForMaskedLM            |  4  | 265.9277 | 300.4081  |  164.608  |        164.5245        |
|       AlbertForQuestionAnswering        |  4  | 263.8151 | 297.8247  | 161.3613  |        160.9262        |
|            XLNetLMHeadModel             |  8  | 280.8873 |  288.768  | 153.7397  |        153.258         |
|            TrOCRForCausalLM             | 32  | 138.9703 | 145.3835  | 108.7775  |        107.6508        |
|      MBartForConditionalGeneration      |  2  | 139.2461 | 144.0065  |  94.4357  |        101.8983        |
|      BartForConditionalGeneration       |  2  | 139.5761 |  143.679  |  91.5053  |        92.8748         |
|    MegatronBertForQuestionAnswering     |  8  | 144.4275 | 147.4522  |  85.7142  |        87.3864         |
| BlenderbotSmallForConditionalGeneration | 64  | 113.0605 | 120.9799  |  80.2047  |        79.4936         |
|     MobileBertForQuestionAnswering      | 128 | 170.8558 |  206.25   |  79.574   |        154.8844        |
|                CamemBert                | 16  | 119.6255 |  122.843  |  77.0499  |        77.0554         |
|          MobileBertForMaskedLM          | 64  | 177.5233 | 210.3432  |  75.4946  |        163.3882        |
|            MBartForCausalLM             |  4  | 115.0626 | 118.5231  |  74.6232  |        73.9665         |
|             BartForCausalLM             |  4  | 115.7442 | 118.3469  |  73.5234  |        73.9717         |
|     PLBartForConditionalGeneration      |  4  | 117.9852 | 126.1055  |  72.6031  |        71.9185         |
|     DistilBertForQuestionAnswering      | 256 | 103.9638 | 104.4623  |  71.0809  |        71.7025         |
|           LayoutLMForMaskedLM           | 16  | 113.8895 | 116.7906  |  70.8646  |        71.0647         |
|            PLBartForCausalLM            |  8  | 116.4171 | 120.4829  |  70.2269  |        69.5718         |
|          DistilBertForMaskedLM          | 128 | 85.2244  |  88.9362  |  69.569   |        68.5665         |
|             BertForMaskedLM             | 16  | 111.5041 |  114.367  |  69.3877  |        69.7909         |
|           RobertaForCausalLM            | 16  | 116.7294 |  119.307  |  69.1391  |        68.9431         |
|             OPTForCausalLM              |  2  | 172.8817 | 181.3302  |  69.0695  |        69.0616         |
|               DistillGPT2               | 16  | 106.9585 |  110.582  |  63.3768  |        62.1685         |
|                 T5Small                 |  4  | 106.6703 | 123.0724  |  60.195   |        60.4629         |
|       T5ForConditionalGeneration        |  4  | 106.6871 |  123.182  |  60.1543  |        60.4394         |
|         MegatronBertForCausalLM         |  4  |  88.323  |  95.114   |  56.9325  |        58.3138         |
|       ElectraForQuestionAnswering       | 64  | 117.6897 | 117.2778  |  54.5874  |        54.8109         |
|       RobertaForQuestionAnswering       | 16  | 97.4507  |  98.5751  |  54.0112  |        53.9797         |
|        BertForQuestionAnswering         | 16  | 96.6689  |  98.2212  |  53.9952  |         53.93          |
|    LayoutLMForSequenceClassification    | 16  | 99.0648  | 100.4693  |  53.6538  |        54.5441         |
|       BlenderbotSmallForCausalLM        | 64  | 58.8498  |  63.2946  |  47.3001  |        45.7893         |
|       MT5ForConditionalGeneration       | 16  | 93.0589  | 110.6594  |  42.2813  |        53.8061         |
|      GPT2ForSequenceClassification      |  4  | 93.5189  |  96.2232  |  39.7542  |        40.0916         |
|      DebertaV2ForQuestionAnswering      |  2  | 155.1777 | 199.9734  |    nan    |        158.1009        |
|          DebertaV2ForMaskedLM           |  1  | 150.8327 | 199.2512  |    nan    |        153.4279        |
|          AllenaiLongformerBase          |  4  | 206.4583 | 290.5165  |    nan    |        116.6212        |
|          BlenderbotForCausalLM          |  4  | 111.4155 |  128.393  |    nan    |        85.6337         |
|       DebertaForQuestionAnswering       |  8  | 93.7738  | 111.0113  |    nan    |        78.0306         |
|           DebertaForMaskedLM            |  4  | 87.6984  | 113.5597  |    nan    |        75.0404         |
+-----------------------------------------+-----+----------+-----------+-----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          mixer_b16_224          | 128 | 0.9972 |  1.0181   |  0.2683  |         1.3586         |
|        convmixer_768_32         | 32  | 0.9985 |  0.9646   |  0.2496  |         1.0018         |
|            pit_b_224            | 64  | 0.9947 |  0.9924   |  0.2332  |         1.4284         |
|        tnt_s_patch16_224        | 128 | 0.998  |  0.9971   |  0.2173  |         2.9688         |
|          gmlp_s16_224           | 128 | 0.9946 |  1.0821   |  0.2093  |         1.831          |
|          gmixer_24_224          | 128 | 0.9948 |  0.8893   |  0.1914  |         1.749          |
|           convit_base           | 64  | 0.9982 |  0.9975   |  0.1704  |         1.6124         |
|          resmlp_12_224          | 128 | 0.9926 |  0.8895   |  0.1393  |         1.2573         |
|           tf_mixnet_l           | 128 | 0.9766 |  0.8269   |  0.1252  |         1.1914         |
|       eca_botnext26ts_256       | 128 | 0.9738 |  0.7193   |  0.1214  |         1.424          |
|          botnet26t_256          | 128 | 0.973  |  0.8515   |   0.12   |         1.4231         |
|            mixnet_l             | 128 | 0.9767 |   0.821   |  0.1199  |         1.1819         |
|             dla102              | 128 | 0.9959 |  0.8153   |  0.1181  |         1.5232         |
|         coat_lite_mini          | 128 | 0.9968 |  0.9953   |  0.1153  |         1.9185         |
|        adv_inception_v3         | 128 | 0.9961 |  0.8602   |  0.1087  |         1.5213         |
|           dm_nfnet_f0           | 128 | 0.9868 |  0.9852   |  0.108   |         1.4283         |
|       gluon_inception_v3        | 128 | 0.9964 |  0.8649   |  0.1076  |         1.5215         |
|          inception_v3           | 128 | 0.9963 |  0.8649   |  0.1058  |         1.5206         |
|         visformer_small         | 128 | 0.9956 |  0.9451   |  0.1055  |         1.1654         |
|            nfnet_l0             | 128 | 0.9901 |  0.8137   |  0.1033  |         1.4317         |
|      vit_base_patch16_224       | 64  | 0.9962 |  0.9936   |  0.103   |         1.2359         |
| deit_base_distilled_patch16_224 | 64  | 0.9968 |  0.9938   |  0.1008  |         1.2554         |
|           volo_d1_224           | 64  | 0.9943 |  0.9733   |  0.0966  |         1.6672         |
|           res2next50            | 128 | 0.9988 |  0.8259   |  0.0954  |         1.3641         |
|      xcit_large_24_p8_224       |  5  |  0.99  |  0.8512   |  0.0925  |         1.583          |
|          convnext_base          | 64  | 0.9833 |   0.985   |  0.0905  |         1.4697         |
|  swin_base_patch4_window7_224   | 64  | 0.9911 |  0.9436   |  0.0866  |         1.6063         |
|     swsl_resnext101_32x16d      | 32  | 0.9978 |  0.8427   |  0.0845  |         1.0211         |
|      beit_base_patch16_224      | 64  | 0.9964 |  0.9667   |  0.0815  |         1.3503         |
|          cspdarknet53           | 64  | 0.9325 |  0.7865   |  0.0786  |         1.2612         |
|            repvgg_a2            | 128 | 0.9356 |  0.7557   |  0.0781  |         1.1196         |
|         poolformer_m36          | 64  | 0.9865 |  0.9834   |  0.0773  |         1.3186         |
|            gernet_l             | 128 | 0.935  |  0.7936   |  0.0754  |         1.0645         |
|           resnest101e           | 64  | 0.9938 |   0.867   |  0.0743  |         1.3532         |
|           selecsls42b           | 128 | 0.9985 |  0.8126   |  0.0741  |         1.4108         |
|       tf_efficientnet_b0        | 128 | 0.9598 |   0.681   |  0.0725  |         1.3834         |
|            hrnet_w18            | 128 | 0.9922 |  0.6437   |  0.0713  |         1.3522         |
|          pnasnet5large          | 16  | 0.9858 |  0.9136   |  0.0707  |         1.1292         |
|          jx_nest_base           | 32  | 0.9871 |  0.9854   |  0.0693  |         1.3587         |
|        res2net50_14w_8s         | 128 | 0.9989 |  0.7894   |  0.0658  |         1.3612         |
|          cait_m36_384           |  4  | 0.9965 |  0.9932   |  0.0654  |         1.3492         |
|         mobilenetv2_100         | 128 | 0.9494 |  0.7375   |  0.065   |         1.4458         |
|            fbnetv3_b            | 128 | 0.9496 |  0.7691   |  0.0631  |         1.3304         |
|             dpn107              | 32  | 0.9317 |  0.8077   |  0.063   |         1.1352         |
|          ghostnet_100           | 128 | 0.992  |  0.7645   |  0.0627  |         1.6332         |
|          spnasnet_100           | 128 | 0.9424 |  0.7387   |  0.0626  |         1.4201         |
|        twins_pcpvt_base         | 64  | 0.9965 |  0.8989   |  0.062   |         1.6596         |
|           rexnet_100            | 128 | 0.9521 |  0.7036   |  0.0617  |         1.3346         |
|        gluon_xception65         | 32  | 0.9923 |  0.8424   |  0.0593  |         1.0791         |
|      mobilenetv3_large_100      | 128 | 0.9498 |  0.7599   |  0.0575  |         1.4416         |
|            tinynet_a            | 128 | 0.9474 |  0.6781   |  0.0551  |         1.2631         |
|         crossvit_9_240          | 128 | 0.9901 |  0.7826   |   0.05   |         1.6155         |
|        ese_vovnet19b_dw         | 128 | 0.9586 |  0.8328   |  0.0431  |         1.3737         |
|        res2net101_26w_4s        | 64  | 1.0002 |  0.7928   |  0.0422  |         1.0871         |
|           regnety_002           | 128 | 0.9536 |  0.7124   |  0.0416  |         1.2392         |
|        sebotnet33ts_256         | 64  | 0.9576 |  0.7648   |  0.0402  |         1.5355         |
|            lcnet_050            | 128 | 0.9399 |  0.7352   |   0.04   |         1.4072         |
|           fbnetc_100            | 128 | 0.9501 |   0.739   |  0.0371  |         1.3932         |
|           mobilevit_s           | 64  | 0.9616 |  0.7315   |  0.0349  |         1.4424         |
|           mnasnet_100           | 128 | 0.9484 |  0.7413   |  0.0332  |         1.4967         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|            hrnet_w18            | 128 | 9.4501  |  36.7204  | 238.2311 |        244.1946        |
|           rexnet_100            | 128 | 5.5807  |  11.0687  | 228.688  |        287.994         |
|          ghostnet_100           | 128 | 7.5524  |  14.8026  | 202.2409 |        240.8781        |
|          pnasnet5large          | 16  |  8.222  |  26.0042  | 162.1515 |        160.1958        |
|           resnest101e           | 64  | 11.2022 |  24.4812  | 153.3509 |        163.9545        |
|            fbnetv3_b            | 128 | 8.3752  |  17.7839  | 148.9306 |        172.145         |
|           mobilevit_s           | 64  | 5.2961  |  11.2766  | 144.6891 |        155.8761        |
|        res2net101_26w_4s        | 64  | 11.1674 |  24.6313  | 141.7545 |        150.3364        |
|           tf_mixnet_l           | 128 | 9.0866  |  16.7921  | 138.6625 |        156.9757        |
|        twins_pcpvt_base         | 64  | 10.4062 |  24.7456  | 138.3742 |        143.3077        |
|            tinynet_a            | 128 | 5.9224  |  12.0883  | 137.7565 |        159.8587        |
|            mixnet_l             | 128 | 8.4716  |  16.032   | 136.6916 |        156.4049        |
|        adv_inception_v3         | 128 |  5.904  |  12.4937  | 134.6815 |        155.2397        |
|      xcit_large_24_p8_224       |  5  | 12.4736 |  28.0433  | 133.2655 |        127.496         |
|      mobilenetv3_large_100      | 128 | 4.2016  |  8.3432   | 129.7193 |        159.3328        |
|       gluon_inception_v3        | 128 | 5.6209  |  12.3748  | 129.714  |        158.4097        |
|          inception_v3           | 128 | 5.9454  |  12.4544  | 129.5733 |        157.1155        |
|       tf_efficientnet_b0        | 128 | 5.0623  |  10.3628  | 128.3209 |        149.9526        |
|        res2net50_14w_8s         | 128 | 8.9114  |  22.7874  | 119.4182 |        122.2097        |
|          cait_m36_384           |  4  | 14.2953 |  31.2188  | 118.342  |         114.01         |
|           fbnetc_100            | 128 | 5.0685  |  9.3748   | 116.6486 |        128.9753        |
|  swin_base_patch4_window7_224   | 64  | 8.8301  |  19.0459  | 111.6963 |        104.5595        |
|          spnasnet_100           | 128 | 4.9567  |  9.2325   | 111.3162 |        134.0569        |
|           mnasnet_100           | 128 |  3.998  |   7.548   | 105.9483 |        111.8459        |
|        sebotnet33ts_256         | 64  | 4.1609  |  8.8181   | 100.9745 |        103.3808        |
|         poolformer_m36          | 64  | 7.5861  |  13.6997  | 99.2376  |        97.8037         |
|         mobilenetv2_100         | 128 | 3.9497  |  7.8255   |  98.26   |        130.4401        |
|             dpn107              | 32  |  9.657  |  19.4156  | 97.0005  |        98.1417         |
|        gluon_xception65         | 32  | 7.8034  |  16.7533  | 90.1582  |        92.7962         |
|             dla102              | 128 | 6.2345  |  14.1148  | 89.8033  |        97.4587         |
|          cspdarknet53           | 64  | 6.0043  |  10.8169  | 87.8388  |        100.0436        |
|         coat_lite_mini          | 128 |  3.233  |  7.8801   | 87.3013  |        87.3649         |
|         crossvit_9_240          | 128 |  5.795  |  13.2443  | 86.7974  |        84.0604         |
|           regnety_002           | 128 | 4.8438  |  8.7924   | 86.7289  |        106.1638        |
|          jx_nest_base           | 32  | 6.6339  |  14.5606  | 85.5055  |        81.0744         |
|       eca_botnext26ts_256       | 128 | 3.0941  |  6.7182   | 83.5699  |        95.3219         |
|            lcnet_050            | 128 | 2.5284  |  4.9318   | 81.6904  |        93.9223         |
|           res2next50            | 128 | 4.9924  |  12.0164  | 81.5545  |        86.3643         |
|          botnet26t_256          | 128 | 3.0527  |  5.9162   | 79.5359  |        88.3505         |
|           selecsls42b           | 128 | 2.6257  |  5.3545   | 76.1452  |        88.8016         |
|           volo_d1_224           | 64  | 5.0095  |  12.3249  | 73.7629  |        72.6365         |
|            gernet_l             | 128 | 4.9328  |  8.7766   | 72.9767  |        80.2044         |
|        tnt_s_patch16_224        | 128 | 6.3902  |  15.8709  | 72.9296  |        66.6086         |
|            nfnet_l0             | 128 | 5.2914  |  10.8322  | 71.7571  |        77.6105         |
|           dm_nfnet_f0           | 128 | 5.9854  |  11.3359  | 69.4463  |        69.3032         |
|        ese_vovnet19b_dw         | 128 | 2.4982  |  4.7979   | 68.9108  |        74.9139         |
|     swsl_resnext101_32x16d      | 32  | 6.0451  |  13.4054  | 63.6882  |        60.0897         |
|         visformer_small         | 128 | 2.7373  |  5.9876   | 63.4818  |        64.0063         |
|          convnext_base          | 64  | 6.5916  |  12.5669  | 61.1343  |        57.6278         |
|          gmlp_s16_224           | 128 | 5.6006  |  11.8961  | 59.3451  |        56.3681         |
|            repvgg_a2            | 128 | 4.8304  |   8.597   | 57.0787  |        58.9892         |
|          gmixer_24_224          | 128 | 6.0045  |  12.7033  | 50.9265  |        48.3271         |
|           convit_base           | 64  |  3.427  |  8.5009   | 49.7348  |        47.2338         |
|            pit_b_224            | 64  | 3.4542  |  7.8401   | 46.2727  |        43.5528         |
| deit_base_distilled_patch16_224 | 64  | 3.0935  |  7.0117   |  41.883  |        41.0271         |
|      vit_base_patch16_224       | 64  | 3.0643  |  6.9207   | 41.8716  |        37.7582         |
|      beit_base_patch16_224      | 64  | 3.8785  |  8.5704   | 40.1535  |        35.1724         |
|        convmixer_768_32         | 32  | 1.6537  |   6.773   | 39.9485  |         36.245         |
|          resmlp_12_224          | 128 | 2.8165  |   5.348   | 39.7911  |        38.5522         |
|          mixer_b16_224          | 128 | 2.6583  |   5.795   | 33.3996  |        31.3397         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.2872  |         1.2836         |
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.2057  |         1.2049         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  1.1899  |         1.1871         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1607  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.1583  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.1215  |         1.1179         |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  1.1129  |         1.1115         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  1.089   |         1.0876         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.0875  |         1.0845         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  1.0758  |         1.0721         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  1.0757  |         1.0728         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  1.0696  |         1.0675         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  1.0556  |         1.0539         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  1.0512  |         1.0506         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  1.0494  |         1.0457         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0377  |         1.0351         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  1.0361  |         1.0328         |
|          convnext_base          | 64  | 1.001  |   0.924   |  1.0345  |         1.0338         |
|             dla102              | 128 | 0.9635 |  0.9155   |  1.0323  |         1.0326         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  1.0251  |         1.0242         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  1.021   |         1.0202         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  1.0203  |         1.0194         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  1.0082  |         1.0072         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  1.0071  |         1.0057         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9976  |         0.9952         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9957  |         0.9948         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.9925  |          0.99          |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.9923  |         0.9902         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9917  |         0.9903         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.9912  |         0.9898         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9905  |         0.989          |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.9885  |         0.989          |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9864  |         0.9854         |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9821  |         0.9793         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.9793  |         0.9786         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.9793  |         0.977          |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.979   |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.9776  |         0.9732         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.9738  |         0.9706         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9732  |         0.9727         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.9714  |         0.9705         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.9702  |         0.9664         |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.966   |         0.9611         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.9646  |         0.9642         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.9637  |         0.9607         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.9611  |         0.9604         |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.9582  |         0.9535         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.9568  |         0.9547         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9562  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9537  |         0.9528         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.9509  |         0.9483         |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.9497  |         0.9451         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.9448  |         0.9403         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.9376  |         0.9361         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.9046  |         0.9045         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.901   |         0.8966         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.8898  |         0.884          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+-----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor  | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+-----------+------------------------+
|            hrnet_w18            | 128 | 280.5116 | 432.4928  | 3947.2928 |        206.4242        |
|          pnasnet5large          | 16  |  198.59  | 214.1355  | 2800.9837 |        173.566         |
|          cait_m36_384           |  4  | 170.3616 |  168.036  | 2562.3871 |        123.5853        |
|        res2net101_26w_4s        | 64  | 105.3118 | 124.4041  | 2395.234  |        90.5963         |
|           mobilevit_s           | 64  | 84.5354  | 111.0516  | 2342.2311 |        56.4601         |
|           resnest101e           | 64  | 165.3454 | 189.0751  | 2210.1021 |        121.5753        |
|        res2net50_14w_8s         | 128 | 140.5749 | 178.0078  | 2146.5942 |        103.1618        |
|        twins_pcpvt_base         | 64  | 116.1019 | 137.9834  | 2132.9439 |        75.9475         |
|           fbnetc_100            | 128 | 82.7068  |  106.354  | 2127.718  |        56.5067         |
|        sebotnet33ts_256         | 64  | 80.4623  | 100.6287  | 1926.0071 |        50.1002         |
|         poolformer_m36          | 64  | 146.8045 | 147.2519  | 1878.4956 |        109.8679        |
|           mnasnet_100           | 128 | 64.1962  |  82.0834  | 1847.0488 |        40.7028         |
|            fbnetv3_b            | 128 | 115.1297 | 142.3403  | 1741.8177 |         82.231         |
|             dpn107              | 32  | 113.9536 | 131.2693  | 1696.7097 |        93.3659         |
|  swin_base_patch4_window7_224   | 64  | 147.2607 | 154.5879  | 1692.6992 |        90.7797         |
|        gluon_xception65         | 32  | 99.5394  | 117.1431  | 1677.6269 |        91.5521         |
|         crossvit_9_240          | 128 | 82.4214  | 104.3599  | 1644.0647 |        50.5808         |
|          inception_v3           | 128 | 160.6759 | 184.9495  | 1518.6141 |        105.2426        |
|           tf_mixnet_l           | 128 | 193.633  | 228.9245  | 1515.1613 |        158.8717        |
|            mixnet_l             | 128 | 185.1715 | 220.3968  | 1511.354  |        153.1728        |
|       gluon_inception_v3        | 128 | 160.8212 | 185.2985  | 1494.3638 |        105.3253        |
|        tnt_s_patch16_224        | 128 | 323.405  | 323.5032  | 1486.4575 |        108.6878        |
|        adv_inception_v3         | 128 | 160.6561 |  186.13   | 1474.8109 |        105.1887        |
|          jx_nest_base           | 32  | 101.6447 |  101.624  | 1469.2578 |        73.7874         |
|             dla102              | 128 | 172.5575 | 210.6917  | 1459.2045 |        112.7068        |
|        ese_vovnet19b_dw         | 128 | 64.4788  |  74.3443  | 1441.4345 |         45.02          |
|          ghostnet_100           | 128 | 90.6971  | 117.4236  | 1438.7733 |        54.9674         |
|     swsl_resnext101_32x16d      | 32  | 118.8709 | 140.3556  | 1407.0496 |        116.107         |
|      xcit_large_24_p8_224       |  5  | 122.6921 | 163.6331  | 1384.8318 |        88.4731         |
|          convnext_base          | 64  | 124.3639 | 123.9077  | 1357.2339 |        83.2585         |
|           res2next50            | 128 | 125.8756 | 152.2168  | 1323.2285 |        92.2582         |
|            tinynet_a            | 128 | 73.5088  | 102.4648  | 1271.9878 |        55.0579         |
|           volo_d1_224           | 64  | 121.0218 | 123.6951  | 1249.4809 |        72.1851         |
|      beit_base_patch16_224      | 64  | 101.4247 | 104.5784  | 1246.7488 |        74.9335         |
|           rexnet_100            | 128 | 79.8912  | 108.1556  | 1242.8728 |        57.1081         |
|        convmixer_768_32         | 32  | 300.3597 |  310.935  | 1206.8702 |        300.0332        |
|           dm_nfnet_f0           | 128 | 128.225  | 128.3738  | 1176.7982 |        88.8126         |
|          cspdarknet53           | 64  | 94.9032  | 112.4482  | 1132.0109 |         70.179         |
|       tf_efficientnet_b0        | 128 | 84.7015  | 119.5555  | 1127.4765 |        58.8514         |
|            nfnet_l0             | 128 | 112.4669 | 137.1731  | 1085.8368 |        77.9415         |
|          spnasnet_100           | 128 | 70.2821  |  89.659   | 1064.3375 |        46.6618         |
|      mobilenetv3_large_100      | 128 | 61.1901  |  76.5408  | 1017.6235 |        40.3894         |
|           regnety_002           | 128 | 40.9185  |  54.4644  | 991.3105  |        29.8532         |
|         coat_lite_mini          | 128 | 112.9755 | 113.1537  | 980.7592  |        58.7082         |
|            gernet_l             | 128 | 77.6884  |  91.584   |  968.186  |        68.2591         |
|         mobilenetv2_100         | 128 | 65.4264  |  84.2245  | 964.0516  |        43.0099         |
|           convit_base           | 64  | 163.1294 | 163.2402  | 958.1112  |        100.9129        |
|            repvgg_a2            | 128 | 77.6487  |   96.02   |  932.823  |        64.9045         |
|       eca_botnext26ts_256       | 128 | 108.6152 | 147.3359  | 874.0356  |         74.274         |
|         visformer_small         | 128 | 91.2768  |  96.1672  | 865.0752  |        77.9675         |
|      vit_base_patch16_224       | 64  | 86.9143  |  87.0147  | 843.3697  |        69.9502         |
| deit_base_distilled_patch16_224 | 64  | 84.7585  |  85.0116  | 842.1676  |        67.3179         |
|          botnet26t_256          | 128 | 101.873  | 116.4517  | 828.6003  |        69.6233         |
|           selecsls42b           | 128 | 60.0213  |  73.7344  | 812.4639  |        42.5188         |
|            lcnet_050            | 128 | 31.6745  |  40.5393  | 750.3981  |        21.2154         |
|          gmlp_s16_224           | 128 | 137.6257 | 126.2967  | 655.5738  |        74.6848         |
|          gmixer_24_224          | 128 | 118.0557 | 132.0863  |  615.921  |        67.0699         |
|            pit_b_224            | 64  | 118.7669 | 118.9782  | 506.8618  |        82.6348         |
|          mixer_b16_224          | 128 | 116.507  | 114.1452  |  435.071  |        85.6255         |
|          resmlp_12_224          | 128 | 53.4309  |  59.7144  | 382.1141  |        42.2646         |
+---------------------------------+-----+----------+-----------+-----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_091_01_04_23_performance_amp_829

Commit hashes

pytorch commit: 92b4620
pytorch commit date: 2023-04-02 02:23:13+00:00
torchbench commit: ea7b71ead75529529d67ffd17541b1f203c49b83
torchbench commit date: 2023-03-31 18:05:58-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git92b4620

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

huydhn added a commit to pytorch/test-infra that referenced this issue Apr 3, 2023
@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 80%, 48/60 | 84%, 38/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.00x    |    1.66x    |    1.00x    |
| inductor_no_cudagraphs |   1.27x    |    1.50x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.80    |    7.23     |    5.92     |
|       aot_eager        |    9.38    |    15.90    |    13.17    |
|        inductor        |   59.50    |    59.88    |   102.12    |
| inductor_no_cudagraphs |   63.09    |    58.67    |   109.34    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.95x    |    0.99x    |    1.02x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Previous report name: /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 82%, 49/60  | 80%, 48/60  |
|        inductor        | huggingface | 84%, 38/45  | 84%, 38/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 88%, 53/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 98%, 44/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.00x    |   1.00x   |
|        inductor        | huggingface |   1.53x    |   1.66x   |
|        inductor        | timm_models |   1.00x    |   1.00x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.50x    |   1.50x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+----------------------------+------------------------+-----------------+
|    suite    |            name            | inductor_no_cudagraphs |    inductor     |
+-------------+----------------------------+------------------------+-----------------+
| torchbench  |            moco            |      fail_to_run       |   fail_to_run   |
| torchbench  |       hf_Longformer        |      fail_to_run       |   fail_to_run   |
| torchbench  |     mobilenet_v3_large     |          pass          |  fail_accuracy  |
| torchbench  |     Background_Matting     |    eager_variation     | eager_variation |
| torchbench  |      vision_maskrcnn       |    eager_variation     | eager_variation |
| torchbench  |         tacotron2          |         0.0000         |     0.0000      |
| torchbench  |            gat             |         0.0000         |     0.0000      |
| torchbench  |            gcn             |         0.0000         |     0.0000      |
| torchbench  |           llama            |         0.0000         |     0.0000      |
| torchbench  |            sage            |         0.0000         |     0.0000      |
| torchbench  |       torchrec_dlrm        |         0.0000         |     0.0000      |
| huggingface | AlbertForQuestionAnswering |     fail_accuracy      |  fail_accuracy  |
+-------------+----------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+-----------------------------------+------------------------+----------+
|    suite    |               name                | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------+------------------------+----------+
| torchbench  |               hf_T5               |         1.9844         |  0.1763  |
| torchbench  |               vgg16               |         1.2536         |  0.1496  |
| torchbench  |             hf_Albert             |         2.2931         |  0.1489  |
| torchbench  |        Background_Matting         |         1.2083         |  0.1212  |
| torchbench  |            timm_nfnet             |         1.4692         |  0.1018  |
| torchbench  |           hf_Bert_large           |         1.5649         |  0.0992  |
| torchbench  |           hf_GPT2_large           |         1.7362         |  0.0969  |
| torchbench  |              hf_Bert              |         1.5868         |  0.0871  |
| torchbench  |           pytorch_unet            |         1.3522         |  0.0722  |
| torchbench  |              hf_Bart              |         1.5821         |  0.071   |
| torchbench  |            hf_T5_large            |         1.8581         |   0.07   |
| torchbench  |           BERT_pytorch            |         2.0803         |  0.0629  |
| torchbench  |              yolov3               |         1.1984         |  0.0569  |
| torchbench  | attention_is_all_you_need_pytorch |         1.4918         |  0.056   |
| torchbench  |           mobilenet_v2            |         1.5229         |  0.0551  |
| torchbench  |              hf_GPT2              |         1.8264         |  0.0547  |
| torchbench  |            timm_regnet            |         1.0076         |  0.0498  |
| torchbench  |           hf_DistilBert           |         1.5576         |  0.0477  |
| torchbench  |              demucs               |         1.0384         |  0.041   |
| torchbench  |      timm_vision_transformer      |         1.3966         |  0.0397  |
| torchbench  |             resnet152             |         1.0157         |  0.0365  |
| torchbench  |        shufflenet_v2_x1_0         |         1.1916         |  0.0359  |
| torchbench  |           timm_resnest            |         1.498          |  0.0349  |
| torchbench  |          pytorch_stargan          |         1.253          |  0.0343  |
| torchbench  |            densenet121            |         1.0371         |  0.0336  |
| torchbench  |             resnet50              |         1.0564         |  0.033   |
| torchbench  |        mobilenet_v3_large         |         1.1896         |  0.0318  |
| torchbench  |            timm_vovnet            |         0.9283         |  0.0316  |
| torchbench  |         timm_efficientnet         |         1.0711         |  0.0313  |
| torchbench  |         phlippe_densenet          |         1.0071         |  0.0305  |
| torchbench  |   pytorch_CycleGAN_and_pix2pix    |         1.7675         |  0.0299  |
| torchbench  |            mnasnet1_0             |         1.0324         |  0.0283  |
| torchbench  |      nvidia_deeprecommender       |         1.0181         |  0.0269  |
| torchbench  |          resnext50_32x4d          |         0.9624         |  0.0267  |
| torchbench  |           squeezenet1_1           |         1.2746         |  0.0262  |
| torchbench  |          pytorch_struct           |         1.0978         |  0.0239  |
| torchbench  |              alexnet              |         1.1367         |  0.0237  |
| torchbench  |            tts_angular            |         0.9356         |  0.0232  |
| torchbench  |          phlippe_resnet           |         0.9995         |  0.0215  |
| torchbench  |       functorch_dp_cifar10        |         1.3476         |  0.0214  |
| torchbench  |             resnet18              |         0.9546         |  0.0208  |
| torchbench  |        speech_transformer         |         1.5675         |  0.019   |
| torchbench  |           fastNLP_Bert            |         1.4869         |  0.0172  |
| torchbench  |          LearningToPaint          |         1.0631         |  0.0161  |
| torchbench  |               dcgan               |         0.8192         |  0.0087  |
| torchbench  |            hf_Reformer            |         1.0657         |  0.0078  |
| torchbench  |           lennard_jones           |         0.8785         |  0.0072  |
| torchbench  |                drq                |         0.9333         |  0.005   |
| torchbench  |         soft_actor_critic         |         0.8875         |  0.0037  |
| torchbench  |                gat                |          0.0           |   0.0    |
| torchbench  |             tacotron2             |          0.0           |   0.0    |
| torchbench  |               sage                |          0.0           |   0.0    |
| torchbench  |                gcn                |          0.0           |   0.0    |
| torchbench  |            hf_BigBird             |         1.6534         |   0.0    |
| torchbench  |               moco                |          0.0           |   0.0    |
| torchbench  |           hf_Longformer           |          0.0           |   0.0    |
| torchbench  |               dlrm                |         1.1384         |   0.0    |
| torchbench  |   timm_vision_transformer_large   |         1.0816         |   0.0    |
| torchbench  |           torchrec_dlrm           |          0.0           |   0.0    |
| huggingface |       BlenderbotForCausalLM       |         1.2135         |   0.0    |
| huggingface |    DebertaForQuestionAnswering    |         0.9507         |   0.0    |
| huggingface |        DebertaForMaskedLM         |         0.8088         |   0.0    |
| huggingface |       DebertaV2ForMaskedLM        |         0.6796         |   0.0    |
| huggingface |   DebertaV2ForQuestionAnswering   |         0.662          |   0.0    |
| huggingface |       AllenaiLongformerBase       |          0.0           |   0.0    |
| timm_models |         convmixer_768_32          |         1.0027         |  0.3393  |
| timm_models |           mixer_b16_224           |         1.3605         |  0.2702  |
| timm_models |             pit_b_224             |         1.4284         |  0.2355  |
| timm_models |         tnt_s_patch16_224         |         2.9752         |  0.2137  |
| timm_models |           gmlp_s16_224            |         1.8317         |  0.2062  |
| timm_models |           gmixer_24_224           |         1.7455         |  0.1907  |
| timm_models |            convit_base            |         1.6096         |  0.1668  |
| timm_models |           resmlp_12_224           |         1.2561         |  0.1366  |
| timm_models |            tf_mixnet_l            |         1.1919         |  0.1242  |
| timm_models |        eca_botnext26ts_256        |         1.4236         |  0.1222  |
| timm_models |             mixnet_l              |         1.1813         |  0.1215  |
| timm_models |          coat_lite_mini           |         1.9194         |  0.1159  |
| timm_models |           botnet26t_256           |         1.4243         |  0.1146  |
| timm_models |              dla102               |         1.5228         |  0.1132  |
| timm_models |       beit_base_patch16_224       |         1.352          |  0.1105  |
| timm_models |          visformer_small          |         1.166          |  0.1088  |
| timm_models |        gluon_inception_v3         |         1.5207         |  0.108   |
| timm_models |         adv_inception_v3          |          1.52          |  0.1079  |
| timm_models |           inception_v3            |         1.518          |  0.1074  |
| timm_models |            dm_nfnet_f0            |         1.4297         |  0.1068  |
| timm_models |       vit_base_patch16_224        |         1.2364         |  0.1043  |
| timm_models |             nfnet_l0              |         1.4322         |  0.1027  |
| timm_models |  deit_base_distilled_patch16_224  |         1.2549         |  0.0984  |
| timm_models |            res2next50             |         1.3622         |  0.0963  |
| timm_models |            volo_d1_224            |         1.6659         |  0.0961  |
| timm_models |       xcit_large_24_p8_224        |         1.569          |  0.0931  |
| timm_models |   swin_base_patch4_window7_224    |         1.6064         |  0.0866  |
| timm_models |      swsl_resnext101_32x16d       |         1.0201         |  0.083   |
| timm_models |             repvgg_a2             |         1.1203         |  0.0786  |
| timm_models |           cspdarknet53            |         1.2614         |  0.0778  |
| timm_models |          poolformer_m36           |         1.3187         |  0.0777  |
| timm_models |             gernet_l              |         1.064          |  0.0768  |
| timm_models |            selecsls42b            |         1.4096         |  0.0736  |
| timm_models |            resnest101e            |         1.3571         |  0.0736  |
| timm_models |             hrnet_w18             |         1.3531         |  0.0724  |
| timm_models |        tf_efficientnet_b0         |         1.3845         |  0.0714  |
| timm_models |           pnasnet5large           |         1.1277         |   0.07   |
| timm_models |           jx_nest_base            |         1.359          |  0.0698  |
| timm_models |           convnext_base           |         1.4715         |  0.0677  |
| timm_models |          crossvit_9_240           |         1.6154         |  0.0663  |
| timm_models |             fbnetv3_b             |         1.3173         |  0.0656  |
| timm_models |          mobilenetv2_100          |         1.4443         |  0.0655  |
| timm_models |           cait_m36_384            |         1.3492         |  0.0652  |
| timm_models |         res2net50_14w_8s          |         1.3573         |  0.0651  |
| timm_models |           ghostnet_100            |         1.5792         |  0.0635  |
| timm_models |              dpn107               |         1.1365         |  0.0625  |
| timm_models |           spnasnet_100            |         1.4189         |  0.0622  |
| timm_models |            rexnet_100             |         1.333          |  0.0617  |
| timm_models |         gluon_xception65          |         1.0784         |  0.0592  |
| timm_models |         twins_pcpvt_base          |         1.6655         |  0.058   |
| timm_models |       mobilenetv3_large_100       |         1.4381         |  0.0576  |
| timm_models |             tinynet_a             |         1.2577         |  0.054   |
| timm_models |         ese_vovnet19b_dw          |         1.3696         |  0.0421  |
| timm_models |         res2net101_26w_4s         |         1.0624         |  0.042   |
| timm_models |            regnety_002            |         1.2267         |  0.0399  |
| timm_models |             lcnet_050             |         1.4641         |  0.0397  |
| timm_models |         sebotnet33ts_256          |         1.5321         |  0.0394  |
| timm_models |            fbnetc_100             |         1.404          |  0.0359  |
| timm_models |            mobilevit_s            |         1.4426         |  0.0345  |
| timm_models |            mnasnet_100            |         1.4957         |  0.033   |
+-------------+-----------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |          hf_T5_large           |        172.9963        | 186.7371 |
| torchbench  |        phlippe_densenet        |        163.1234        | 133.6101 |
| torchbench  |          densenet121           |        136.2517        | 132.1669 |
| torchbench  |       timm_efficientnet        |        142.743         | 121.4614 |
| torchbench  |       mobilenet_v3_large       |        137.4203        | 115.8424 |
| torchbench  |          mobilenet_v2          |        129.5846        | 108.8431 |
| torchbench  |           hf_BigBird           |        128.1795        |   nan    |
| torchbench  | timm_vision_transformer_large  |        125.2624        |   nan    |
| huggingface |     MobileBertForMaskedLM      |        146.0299        | 150.1586 |
| huggingface | MobileBertForQuestionAnswering |        138.1819        | 142.4138 |
| huggingface | M2M100ForConditionalGeneration |        134.3756        | 137.4141 |
| huggingface |        XGLMForCausalLM         |        131.1014        | 131.0426 |
| huggingface |  MT5ForConditionalGeneration   |        132.809         | 127.1359 |
| timm_models |           hrnet_w18            |        249.5104        |  241.07  |
| timm_models |           rexnet_100           |        285.6805        | 231.1478 |
| timm_models |          ghostnet_100          |        234.8177        | 201.641  |
| timm_models |         pnasnet5large          |        165.2232        | 160.1059 |
| timm_models |          resnest101e           |        162.731         | 156.4844 |
| timm_models |           fbnetv3_b            |        171.645         | 151.987  |
| timm_models |          mobilevit_s           |        158.3839        | 148.4526 |
| timm_models |       res2net101_26w_4s        |        150.0525        | 147.9464 |
| timm_models |        twins_pcpvt_base        |        147.0902        | 145.2525 |
| timm_models |          tf_mixnet_l           |        160.9612        | 142.4357 |
| timm_models |        adv_inception_v3        |        159.3719        | 138.9232 |
| timm_models |            mixnet_l            |        161.0833        | 138.4925 |
| timm_models |          inception_v3          |        154.704         | 137.0328 |
| timm_models |       gluon_inception_v3       |        161.8541        | 136.8902 |
| timm_models |      xcit_large_24_p8_224      |        131.2005        | 136.6948 |
| timm_models |     mobilenetv3_large_100      |        159.1405        | 135.6454 |
| timm_models |           tinynet_a            |        161.4224        | 134.1131 |
| timm_models |       tf_efficientnet_b0       |        149.0824        | 130.4467 |
| timm_models |           fbnetc_100           |        136.4893        | 122.4206 |
| timm_models |        res2net50_14w_8s        |        122.5245        | 122.3563 |
| timm_models |          cait_m36_384          |        113.9006        | 121.5843 |
| timm_models |          spnasnet_100          |        139.1139        | 116.3327 |
| timm_models |        mobilenetv2_100         |        131.4004        | 112.2976 |
| timm_models |          mnasnet_100           |        122.8089        | 107.161  |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.8951  |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.8934  |
| torchbench  |                resnet50                 |         0.8854         |  0.8908  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.889   |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8873  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8834  |
| torchbench  |           mobilenet_v3_large            |         0.8092         |  0.8794  |
| torchbench  |           speech_transformer            |         0.869          |  0.8694  |
| torchbench  |               densenet121               |         0.7998         |  0.8268  |
| torchbench  |               mnasnet1_0                |         0.8071         |  0.8154  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.8064  |
| torchbench  |             resnext50_32x4d             |         0.7753         |  0.7792  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.7552  |
| torchbench  |             pytorch_struct              |         0.7362         |  0.7428  |
| torchbench  |                resnet18                 |         0.6097         |  0.619   |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6004         |  0.6035  |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.451   |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3554  |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8872  |
| huggingface |            TrOCRForCausalLM             |         0.9583         |  0.8855  |
| huggingface |     M2M100ForConditionalGeneration      |         0.9908         |  0.8808  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8749  |
| huggingface |             XGLMForCausalLM             |         0.9792         |  0.8421  |
| huggingface |       BlenderbotSmallForCausalLM        |         0.9119         |  0.8215  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.8112  |
| huggingface |         Speech2Text2ForCausalLM         |         0.8779         |  0.7921  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6659  |
| timm_models |               regnety_002               |         0.8966         |  0.901   |
| timm_models |                lcnet_050                |         0.884          |  0.8898  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/geomean_over_time.png :

bench_logs/passrate_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/memory_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Accuracy regressions

+------------------------+--------------------+-------------+---------------+
|        compiler        |        name        | prev_status |  cur_status   |
+------------------------+--------------------+-------------+---------------+
| inductor_no_cudagraphs |   hf_Longformer    |    pass     |  fail_to_run  |
|        inductor        |   hf_Longformer    |    pass     |  fail_to_run  |
|        inductor        | mobilenet_v3_large |    pass     | fail_accuracy |
+------------------------+--------------------+-------------+---------------+

Performance speedup regressions

+------------------------+---------------+-------------+------------+
|        compiler        |     name      | prev_status | cur_status |
+------------------------+---------------+-------------+------------+
| inductor_no_cudagraphs |      drq      |   1.0116    |   0.9333   |
| inductor_no_cudagraphs | hf_Longformer |   1.3008    |    0.0     |
+------------------------+---------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Performance speedup regressions

+------------------------+-----------------------+-------------+------------+
|        compiler        |         name          | prev_status | cur_status |
+------------------------+-----------------------+-------------+------------+
| inductor_no_cudagraphs | AllenaiLongformerBase |    1.559    |    0.0     |
+------------------------+-----------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+--------------------------------+-------------+------------+
| compiler |              name              | prev_status | cur_status |
+----------+--------------------------------+-------------+------------+
| inductor | M2M100ForConditionalGeneration |   0.9621    |   0.8808   |
| inductor |        XGLMForCausalLM         |   0.9344    |   0.8421   |
+----------+--------------------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829

Compilation latency (sec) regressions

+------------------------+------------------+-------------+------------+
|        compiler        |       name       | prev_status | cur_status |
+------------------------+------------------+-------------+------------+
| inductor_no_cudagraphs |   mnasnet_100    |  111.8459   |  122.8089  |
|        inductor        |    fbnetc_100    |  116.6486   |  122.4206  |
|        inductor        | res2net50_14w_8s |  119.4182   |  122.3563  |
|        inductor        |   cait_m36_384   |   118.342   |  121.5843  |
+------------------------+------------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 0.9974 |  0.1778   |  1.2318  |         1.2339         |
|               hf_T5               |  8   | 0.9851 |  0.8539   |  0.1763  |         1.9844         |
|               vgg16               |  64  | 0.9993 |  0.9985   |  0.1496  |         1.2536         |
|             hf_Albert             |  8   | 0.9947 |  0.9594   |  0.1489  |         2.2931         |
|        Background_Matting         |  4   | 0.9986 |  0.1372   |  0.1212  |         1.2083         |
|            timm_nfnet             | 128  | 0.9863 |  0.9843   |  0.1018  |         1.4692         |
|           hf_Bert_large           |  4   | 0.9914 |   0.868   |  0.0992  |         1.5649         |
|           hf_GPT2_large           |  4   | 0.983  |  0.9712   |  0.0969  |         1.7362         |
|              hf_Bert              |  4   | 0.9943 |  0.8385   |  0.0871  |         1.5868         |
|           pytorch_unet            |  1   | 0.9962 |  0.2048   |  0.0722  |         1.3522         |
|              hf_Bart              |  4   | 0.9867 |  0.8259   |  0.071   |         1.5821         |
|            hf_T5_large            |  2   | 0.9748 |  0.8102   |   0.07   |         1.8581         |
|           BERT_pytorch            |  16  | 0.9871 |  0.8036   |  0.0629  |         2.0803         |
|              yolov3               |  16  | 0.9963 |  0.8064   |  0.0569  |         1.1984         |
| attention_is_all_you_need_pytorch | 256  | 0.9883 |  0.9123   |  0.056   |         1.4918         |
|           mobilenet_v2            |  96  | 0.997  |  0.7773   |  0.0551  |         1.5229         |
|              hf_GPT2              |  4   | 0.9932 |  0.9536   |  0.0547  |         1.8264         |
|            timm_regnet            |  32  | 0.9168 |  0.7921   |  0.0498  |         1.0076         |
|           hf_DistilBert           |  8   | 0.9807 |  0.9372   |  0.0477  |         1.5576         |
|              demucs               |  4   | 1.0003 |  1.0011   |  0.041   |         1.0384         |
|      timm_vision_transformer      |  32  |  0.99  |  0.8465   |  0.0397  |         1.3966         |
|             resnet152             |  32  | 0.9957 |  0.7496   |  0.0365  |         1.0157         |
|        shufflenet_v2_x1_0         | 128  | 0.9958 |  0.7508   |  0.0359  |         1.1916         |
|           timm_resnest            |  32  | 0.9925 |  0.8519   |  0.0349  |         1.498          |
|          pytorch_stargan          |  16  | 0.9845 |  0.7983   |  0.0343  |         1.253          |
|            densenet121            |  4   | 0.9854 |  0.7199   |  0.0336  |         1.0371         |
|             resnet50              |  32  | 0.9962 |   0.779   |  0.033   |         1.0564         |
|        mobilenet_v3_large         |  32  | 0.9975 |  0.7857   |  0.0318  |         1.1896         |
|            timm_vovnet            |  32  | 0.8484 |  0.7054   |  0.0316  |         0.9283         |
|         timm_efficientnet         |  32  | 0.9358 |  0.6261   |  0.0313  |         1.0711         |
|         phlippe_densenet          | 128  | 0.9851 |  0.7659   |  0.0305  |         1.0071         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9673 |  0.8848   |  0.0299  |         1.7675         |
|            mnasnet1_0             |  32  | 0.9886 |   0.735   |  0.0283  |         1.0324         |
|      nvidia_deeprecommender       | 256  | 0.9985 |  0.9984   |  0.0269  |         1.0181         |
|          resnext50_32x4d          |  8   | 0.9828 |  0.7163   |  0.0267  |         0.9624         |
|           squeezenet1_1           |  32  | 0.9805 |   0.938   |  0.0262  |         1.2746         |
|          pytorch_struct           | 200  | 0.9186 |  0.8282   |  0.0239  |         1.0978         |
|              alexnet              | 128  | 0.9994 |  0.9967   |  0.0237  |         1.1367         |
|            tts_angular            |  64  | 0.9135 |  0.8856   |  0.0232  |         0.9356         |
|          phlippe_resnet           | 128  | 0.9883 |  0.7559   |  0.0215  |         0.9995         |
|       functorch_dp_cifar10        |  64  | 0.9625 |  0.9159   |  0.0214  |         1.3476         |
|             resnet18              |  16  | 0.9883 |  0.7533   |  0.0208  |         0.9546         |
|        speech_transformer         |  32  | 0.9776 |  0.8242   |  0.019   |         1.5675         |
|           fastNLP_Bert            |  6   | 0.9934 |  0.8558   |  0.0172  |         1.4869         |
|          LearningToPaint          |  96  | 0.9892 |  0.7633   |  0.0161  |         1.0631         |
|               dcgan               |  32  | 0.8584 |  0.6855   |  0.0087  |         0.8192         |
|            hf_Reformer            |  4   | 0.9856 |  0.9636   |  0.0078  |         1.0657         |
|           lennard_jones           | 1000 | 0.8395 |   0.744   |  0.0072  |         0.8785         |
|                drq                |  1   | 0.9544 |  0.7487   |  0.005   |         0.9333         |
|         soft_actor_critic         | 256  | 0.838  |  0.6303   |  0.0037  |         0.8875         |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|            hf_BigBird             |  2   | 0.9506 |  0.7776   |   0.0    |         1.6534         |
|               moco                |  32  | 0.9394 |    0.0    |   0.0    |          0.0           |
|           hf_Longformer           |  2   | 1.0113 |  0.6916   |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9397 |  0.8437   |   0.0    |         1.1384         |
|   timm_vision_transformer_large   |  32  | 0.998  |    0.0    |   0.0    |         1.0816         |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |  fail_accuracy   |          pass          |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 27.1839 |  56.3713  | 186.7371 |        172.9963        |
|         phlippe_densenet          | 128  | 3.2436  |  7.0681   | 133.6101 |        163.1234        |
|            densenet121            |  4   | 7.5098  |  18.0935  | 132.1669 |        136.2517        |
|         timm_efficientnet         |  32  | 5.0084  |  10.2512  | 121.4614 |        142.743         |
|        mobilenet_v3_large         |  32  | 3.4693  |  7.6682   | 115.8424 |        137.4203        |
|           hf_GPT2_large           |  4   | 14.7498 |  30.4518  | 114.2755 |        105.5657        |
|           mobilenet_v2            |  96  |  3.138  |  7.0279   | 108.8431 |        129.5846        |
|              yolov3               |  16  | 5.0099  |  10.6699  | 106.2531 |        117.6742        |
|             resnet152             |  32  | 9.1595  |  20.3786  | 102.4439 |        106.6659        |
|            mnasnet1_0             |  32  | 3.1666  |   6.794   | 94.0186  |        105.2359        |
|            hf_Reformer            |  4   |  4.185  |  6.0833   | 91.4675  |        40.9656         |
|        speech_transformer         |  32  | 6.0776  |  14.0098  | 90.2559  |        77.4673         |
|           timm_resnest            |  32  | 1.8533  |  3.9321   | 82.5072  |         99.349         |
| attention_is_all_you_need_pytorch | 256  | 4.4658  |  11.0627  | 78.8062  |        75.0016         |
|        shufflenet_v2_x1_0         | 128  | 3.5266  |  7.7656   | 73.8642  |        82.8947         |
|            timm_regnet            |  32  | 6.8133  |  12.5215  | 71.7385  |        70.7055         |
|           BERT_pytorch            |  16  | 4.8873  |  11.5934  | 71.2112  |        68.9023         |
|            timm_nfnet             | 128  | 5.7613  |  11.0551  | 70.5161  |        72.9713         |
|           hf_Bert_large           |  4   | 10.2769 |  21.1756  | 67.6799  |        64.4897         |
|        Background_Matting         |  4   | 3.0926  |  11.5915  | 64.5946  |        68.4243         |
|           fastNLP_Bert            |  6   | 5.1938  |  11.2226  |  62.264  |        50.4297         |
|             resnet50              |  32  | 3.2284  |  6.9724   | 59.8809  |        63.4379         |
|            timm_vovnet            |  32  | 3.6238  |   6.435   |  57.825  |         60.961         |
|               hf_T5               |  8   | 5.6924  |  13.5262  |  54.196  |        50.4546         |
|              hf_Bart              |  4   |  6.145  |  13.7219  | 54.1296  |         49.799         |
|           pytorch_unet            |  1   | 1.5282  |  4.4303   | 52.5243  |         59.85          |
|      timm_vision_transformer      |  32  | 3.3653  |  7.6768   | 51.5196  |        50.2232         |
|          resnext50_32x4d          |  8   |  3.252  |  7.0832   |  51.321  |        53.2267         |
|       functorch_dp_cifar10        |  64  |  1.211  |  2.3995   | 47.5232  |        56.3704         |
|              hf_GPT2              |  4   | 4.6983  |  10.2813  | 44.9956  |        41.4933         |
|            Super_SloMo            |  6   | 2.7944  |   9.799   | 43.4266  |        43.2896         |
|          LearningToPaint          |  96  | 1.4403  |  3.0398   | 43.0288  |        44.7587         |
|             hf_Albert             |  8   | 2.4908  |  8.6044   |  42.345  |        38.5278         |
|              hf_Bert              |  4   | 5.0519  |  10.4672  | 41.1991  |        39.0286         |
|          pytorch_stargan          |  16  | 1.2424  |  3.2385   | 40.5932  |        45.8186         |
|             resnet18              |  16  | 1.3525  |   2.871   | 38.5659  |         44.663         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.222  |  2.9695   | 34.2582  |        37.3867         |
|              demucs               |  4   | 1.4295  |  2.1853   | 33.8911  |        30.9682         |
|           hf_DistilBert           |  8   | 2.3764  |  5.6567   | 31.9699  |        31.3645         |
|          phlippe_resnet           | 128  | 1.3352  |  2.8721   | 30.3376  |        32.7975         |
|           squeezenet1_1           |  32  | 1.0591  |   1.773   | 23.6046  |        25.4622         |
|          pytorch_struct           | 200  | 0.7514  |  1.4579   | 22.2458  |        20.0937         |
|              alexnet              | 128  | 0.4928  |  0.8052   |  17.159  |        16.0938         |
|               vgg16               |  64  | 0.6313  |  1.1274   | 17.0607  |        15.6467         |
|                drq                |  1   | 0.6593  |  1.0207   | 14.5975  |        10.6084         |
|      nvidia_deeprecommender       | 256  | 0.4811  |   0.765   | 11.5678  |         9.6057         |
|         soft_actor_critic         | 256  | 0.4353  |  0.6132   | 10.8198  |         7.4349         |
|               dcgan               |  32  |  0.431  |  0.7128   | 10.1987  |         7.8768         |
|            tts_angular            |  64  | 0.4524  |   0.513   |  7.655   |         5.9045         |
|           lennard_jones           | 1000 | 0.3947  |  0.6039   |  7.5789  |         6.0867         |
|            hf_BigBird             |  2   | 12.9783 |  37.4888  |   nan    |        128.1795        |
|   timm_vision_transformer_large   |  32  | 9.4853  |    nan    |   nan    |        125.2624        |
|               dlrm                | 1024 | 0.3698  |  0.7803   |   nan    |         7.5179         |
|           hf_Longformer           |  2   | 9.5111  |  30.8277  |   nan    |          nan           |
|               moco                |  32  | 33.1483 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.2585  |         1.2557         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.2082         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  1.193   |         1.1717         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.1751  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.1728  |         1.1719         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  1.1687  |         1.168          |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  1.1296  |         1.1266         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  1.1278  |         1.128          |
|           mobilenet_v2            |  96  | 0.9857 |  0.7655   |  1.1085  |         1.1017         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  1.1053  |         0.9973         |
|            timm_nfnet             | 128  | 0.9071 |   0.875   |  1.0766  |         1.0734         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  1.0737  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  1.0736  |         1.0713         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  1.0687  |         0.9997         |
|                drq                |  1   | 0.9877 |  0.8852   |  1.0607  |         0.9573         |
|        Background_Matting         |  4   | 1.0127 |  0.6487   |  1.0421  |         1.0403         |
|              yolov3               |  16  | 0.9838 |  0.8253   |  1.037   |         1.0113         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  1.0344  |         1.0258         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  1.0198  |         0.9983         |
|         timm_efficientnet         |  32  | 0.9866 |  0.8182   |  1.0128  |         1.006          |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  1.0011  |         0.9945         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.9823  |         0.9808         |
|        shufflenet_v2_x1_0         | 128  | 0.9551 |  0.8397   |  0.9736  |         0.9666         |
|              demucs               |  4   | 0.966  |  0.9657   |  0.9675  |         0.9656         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.9645  |         0.9645         |
|           timm_resnest            |  32  | 0.9887 |  0.8826   |  0.953   |         0.9675         |
|            timm_regnet            |  32  | 0.9913 |  0.8509   |  0.9527  |         0.9496         |
|             resnet152             |  32  | 0.9958 |  0.8946   |  0.9445  |         0.9398         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.9434  |         0.939          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.9306  |         0.9308         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.9236  |         0.9173         |
|           squeezenet1_1           |  32  | 0.9695 |  0.9291   |  0.909   |         0.9087         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.8951  |         0.8931         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.8934  |         0.8893         |
|             resnet50              |  32  | 0.9914 |  0.8624   |  0.8908  |         0.8854         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.889   |         0.8869         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8873  |         0.8835         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8834  |         0.8659         |
|        mobilenet_v3_large         |  32  | 0.9765 |  0.8395   |  0.8794  |         0.8092         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8694  |         0.869          |
|            densenet121            |  4   | 0.994  |  0.9824   |  0.8268  |         0.7998         |
|            mnasnet1_0             |  32  | 0.9757 |  0.8641   |  0.8154  |         0.8071         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.8064  |         0.8022         |
|          resnext50_32x4d          |  8   | 0.9968 |  0.8409   |  0.7792  |         0.7753         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.7552  |         0.7463         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7428  |         0.7362         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.619   |         0.6097         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8594   |  0.6035  |         0.6004         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.451   |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3554  |         0.3395         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |   nan    |         1.1191         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |         1.0009         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 0.9965 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+-----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor  | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+-----------+------------------------+
|            hf_Reformer            |  4   | 82.1504  |  83.9974  | 10561.488 |        76.0269         |
|            hf_T5_large            |  2   | 224.2818 | 274.4786  | 3363.9972 |        120.3875        |
|           fastNLP_Bert            |  6   | 52.9575  |  60.981   | 3177.2196 |         35.389         |
|        speech_transformer         |  32  | 65.1804  |  76.2014  | 3146.9017 |        37.0366         |
|           hf_GPT2_large           |  4   | 212.326  | 215.3078  | 2182.5044 |        120.4589        |
|             resnet152             |  32  | 64.6135  |  88.1384  | 1969.1733 |         63.414         |
|            densenet121            |  4   | 53.5913  |  72.9229  | 1709.5963 |        52.4059         |
|              demucs               |  4   |  53.775  |  53.6312  | 1311.6276 |        51.5601         |
|            timm_regnet            |  32  | 60.6424  |  70.1136  | 1239.5178 |        58.4967         |
|              yolov3               |  16  | 68.6624  |  84.9074  | 1213.8618 |        57.1448         |
|              hf_Bart              |  4   | 62.9388  |  91.7965  | 1212.8274 |        36.4932         |
|            timm_nfnet             | 128  | 120.4276 | 120.1556  | 1166.9975 |        80.2167         |
|         timm_efficientnet         |  32  | 34.2847  |  51.4408  | 1110.8592 |        30.2873         |
| attention_is_all_you_need_pytorch | 256  | 55.3348  |  59.0987  | 1100.2679 |        36.4712         |
|        Background_Matting         |  4   | 126.0116 | 917.2602  | 1040.7658 |        104.1258        |
|               hf_T5               |  8   | 181.7596 | 212.4158  | 1025.9797 |        90.5604         |
|                drq                |  1   |  3.473   |  4.3829   | 944.1049  |         4.1854         |
|              hf_GPT2              |  4   | 49.2215  |  55.8285  | 917.9442  |        29.0662         |
|        mobilenet_v3_large         |  32  | 26.9093  |  33.3058  | 905.8193  |        22.1227         |
|           BERT_pytorch            |  16  | 53.5978  |  65.9784  | 899.7848  |        27.5932         |
|        shufflenet_v2_x1_0         | 128  | 30.7228  |  42.171   | 899.1365  |        26.8651         |
|           hf_Bert_large           |  4   | 82.7608  |  94.2839  | 862.5685  |        53.2408         |
|           mobilenet_v2            |  96  |  47.127  |  60.3529  | 858.9875  |        30.8587         |
|      timm_vision_transformer      |  32  | 31.9889  |  37.492   | 851.2357  |        20.2943         |
|            mnasnet1_0             |  32  | 22.4674  |  31.6197  | 842.9537  |        22.6156         |
|             resnet50              |  32  |  26.565  |  33.1821  | 840.9291  |        25.0952         |
|         phlippe_densenet          | 128  | 23.3706  |  30.4315  | 838.2019  |        23.4355         |
|            timm_vovnet            |  32  | 28.8926  |  34.9797  | 831.6207  |        26.8291         |
|          resnext50_32x4d          |  8   | 20.5913  |  27.5347  | 815.3013  |        20.9679         |
|          LearningToPaint          |  96  | 11.9977  |  15.6397  | 782.2721  |        10.7184         |
|         soft_actor_critic         | 256  |  1.9746  |  2.4544   | 721.6813  |         2.5391         |
|           timm_resnest            |  32  |  24.361  |  28.238   | 695.9714  |         16.152         |
|           hf_DistilBert           |  8   | 32.0219  |  35.6536  | 679.4642  |        21.6204         |
|       functorch_dp_cifar10        |  64  |  10.313  |  11.1181  | 558.3025  |         7.5329         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.4317  |  17.0534  |  555.74   |         7.8987         |
|           pytorch_unet            |  1   | 39.9863  | 194.2452  | 555.6841  |        29.4556         |
|             resnet18              |  16  |  9.3066  |  12.8052  | 541.7002  |         9.7367         |
|              hf_Bert              |  4   | 40.1099  |  47.4841  | 505.3826  |        25.8377         |
|          pytorch_stargan          |  16  | 15.5692  |  18.283   | 496.1288  |         11.834         |
|          phlippe_resnet           | 128  |  8.857   |  11.9125  | 480.5143  |         8.9496         |
|           squeezenet1_1           |  32  | 12.0728  |  11.6035  | 473.3581  |         8.7205         |
|             hf_Albert             |  8   | 68.5637  |  72.5135  | 471.9266  |        29.7128         |
|               vgg16               |  64  | 66.2419  |  66.2603  | 444.6913  |        52.8905         |
|              alexnet              | 128  |  9.833   |  9.8603   | 417.9721  |         8.633          |
|      nvidia_deeprecommender       | 256  |  10.233  |  10.2473  | 383.1601  |        10.0392         |
|               dcgan               |  32  |  2.3552  |  3.0733   |  376.119  |         2.5602         |
|            tts_angular            |  64  |  6.7913  |  7.0291   | 324.2659  |         6.6776         |
|           lennard_jones           | 1000 |  1.8078  |  2.4578   | 322.6668  |         1.7734         |
|          pytorch_struct           | 200  |  5.2823  |  7.6992   |  223.877  |         4.2655         |
|            Super_SloMo            |  6   | 79.6119  | 446.0152  |  64.378   |        64.4296         |
|   timm_vision_transformer_large   |  32  | 465.1787 |    nan    |    nan    |        428.5721        |
|            hf_BigBird             |  2   | 205.1698 | 280.9342  |    nan    |        116.0408        |
|               dlrm                | 1024 |  4.3701  |  4.8816   |    nan    |         3.6244         |
|           hf_Longformer           |  2   | 112.9296 | 163.8085  |    nan    |          nan           |
|               moco                |  32  | 53.4647  |    nan    |    nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |    nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |    nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |    nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |    nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |    nan    |          nan           |
+-----------------------------------+------+----------+-----------+-----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9913 |  0.9092   |  2.4697  |         2.4464         |
|      GPT2ForSequenceClassification      |  4  | 0.9817 |  0.9514   |  2.2959  |         2.288          |
|          MobileBertForMaskedLM          | 64  | 0.9456 |  0.8021   |  2.2922  |         1.0777         |
|             XGLMForCausalLM             |  8  | 0.9963 |  0.8175   |  2.1908  |         1.4762         |
|       MT5ForConditionalGeneration       | 16  | 0.9887 |  0.8428   |  2.1647  |         1.8413         |
|     MobileBertForQuestionAnswering      | 128 | 0.9519 |  0.8056   |  2.125   |         1.0812         |
|       ElectraForQuestionAnswering       | 64  | 0.9871 |  0.9768   |  2.1011  |         2.0924         |
|           ElectraForCausalLM            | 32  | 0.9825 |  0.9369   |  1.8285  |         1.8196         |
|    LayoutLMForSequenceClassification    | 16  | 0.9843 |  0.9712   |  1.8201  |         1.7899         |
|            XLNetLMHeadModel             |  8  | 0.9955 |  0.9669   |  1.8149  |         1.8139         |
|       RobertaForQuestionAnswering       | 16  | 0.9837 |   0.97    |  1.7714  |         1.7574         |
|        BertForQuestionAnswering         | 16  | 0.9844 |   0.97    |  1.7609  |         1.7626         |
|       T5ForConditionalGeneration        |  4  | 0.9807 |  0.8506   |  1.7472  |         1.7299         |
|                 T5Small                 |  4  | 0.9769 |  0.8488   |  1.7416  |         1.7303         |
|               DistillGPT2               | 16  | 0.9879 |  0.9549   |  1.6667  |         1.6984         |
|           RobertaForCausalLM            | 16  | 0.9865 |  0.9624   |  1.6653  |         1.6649         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9803 |  0.9599   |  1.6509  |         1.628          |
|            PLBartForCausalLM            |  8  | 0.9905 |  0.9615   |  1.6508  |         1.6814         |
|       AlbertForQuestionAnswering        |  4  | 1.0001 |   0.885   |  1.6472  |         1.6498         |
|            AlbertForMaskedLM            |  4  | 0.9997 |  0.8849   |  1.6451  |         1.6459         |
|     PLBartForConditionalGeneration      |  4  | 0.9875 |  0.9481   |  1.6404  |         1.6434         |
|           LayoutLMForMaskedLM           | 16  | 0.9864 |  0.9615   |  1.5878  |         1.5974         |
|             BertForMaskedLM             | 16  | 0.9859 |  0.9611   |  1.586   |         1.5838         |
|         Speech2Text2ForCausalLM         | 256 | 0.9774 |  0.9243   |  1.5447  |         1.5811         |
|             BartForCausalLM             |  4  | 0.986  |  0.9567   |  1.5431  |         1.5495         |
|     M2M100ForConditionalGeneration      | 16  | 0.9931 |   0.838   |  1.5394  |         1.4684         |
|                CamemBert                | 16  | 0.9876 |  0.9634   |  1.5378  |         1.5334         |
|         MegatronBertForCausalLM         |  4  | 0.9871 |  0.9185   |  1.533   |         1.4945         |
|            MBartForCausalLM             |  4  | 0.9851 |  0.9548   |  1.5273  |         1.5386         |
|            YituTechConvBert             | 16  | 0.9856 |  0.9581   |  1.4989  |         1.4928         |
|      BartForConditionalGeneration       |  2  | 1.0003 |  0.9452   |  1.4985  |         1.4509         |
|      MBartForConditionalGeneration      |  2  | 0.9996 |  0.9665   |  1.4843  |         1.4717         |
|     PegasusForConditionalGeneration     | 32  | 0.9977 |  0.9237   |  1.475   |         1.3258         |
|     DistilBertForQuestionAnswering      | 256 | 0.9938 |  0.9875   |  1.4588  |         1.4486         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.993  |  0.9179   |  1.3862  |         1.3897         |
|            TrOCRForCausalLM             | 32  | 0.9888 |  0.9489   |  1.2662  |         1.2846         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9819 |  0.9146   |  1.2401  |         1.2613         |
|          DistilBertForMaskedLM          | 128 | 0.9915 |  0.9506   |  1.2143  |         1.2335         |
|           PegasusForCausalLM            | 32  | 0.9855 |  0.9287   |  1.2037  |         1.2737         |
|          BlenderbotForCausalLM          |  4  | 0.9757 |  0.8356   |   0.0    |         1.2135         |
|       DebertaForQuestionAnswering       |  8  | 0.8082 |  0.6999   |   0.0    |         0.9507         |
|           DebertaForMaskedLM            |  4  | 0.7409 |  0.5748   |   0.0    |         0.8088         |
|          DebertaV2ForMaskedLM           |  1  | 0.6819 |  0.5218   |   0.0    |         0.6796         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6902 |  0.5251   |   0.0    |         0.662          |
|          AllenaiLongformerBase          |  4  | 1.0051 |  0.6705   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 17.5661 |  40.7808  | 150.1586 |        146.0299        |
|     MobileBertForQuestionAnswering      | 128 | 17.528  |  40.5419  | 142.4138 |        138.1819        |
|     M2M100ForConditionalGeneration      | 16  | 11.7904 |  27.0495  | 137.4141 |        134.3756        |
|             XGLMForCausalLM             |  8  |  9.58   |  20.9459  | 131.0426 |        131.1014        |
|       MT5ForConditionalGeneration       | 16  | 8.0609  |  18.6218  | 127.1359 |        132.809         |
|            XLNetLMHeadModel             |  8  | 10.5764 |  27.8804  | 95.0995  |        92.3463         |
|      MBartForConditionalGeneration      |  2  | 11.9053 |  26.3129  | 82.6644  |        77.7257         |
|      BartForConditionalGeneration       |  2  | 11.6664 |  26.3982  |  78.76   |        74.9006         |
|     PegasusForConditionalGeneration     | 32  | 5.4436  |  19.4239  | 71.9856  |        66.2475         |
|    MegatronBertForQuestionAnswering     |  8  | 10.3503 |  21.4856  |  68.795  |        66.9115         |
|         MegatronBertForCausalLM         |  4  | 10.4109 |  21.3972  |  68.613  |        66.4986         |
|            YituTechConvBert             | 16  | 7.2884  |  16.7359  | 66.5627  |        67.3416         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7254  |  17.3808  | 58.1085  |        54.2726         |
|       T5ForConditionalGeneration        |  4  | 5.6311  |  12.705   | 52.4382  |        50.0187         |
|                 T5Small                 |  4  | 5.6785  |  12.8488  | 51.9715  |        49.7149         |
|           ElectraForCausalLM            | 32  | 5.2403  |  11.0789  | 51.3684  |        51.4084         |
|     PLBartForConditionalGeneration      |  4  | 6.1853  |  13.4055  | 50.1438  |        46.9464         |
|    LayoutLMForSequenceClassification    | 16  | 5.4919  |  11.2769  | 47.7962  |         45.784         |
|       ElectraForQuestionAnswering       | 64  |  5.232  |  10.8441  |  44.228  |        44.8725         |
|            MBartForCausalLM             |  4  | 5.5821  |  11.063   | 42.7734  |        40.9388         |
|           LayoutLMForMaskedLM           | 16  | 5.5417  |  11.3085  | 42.0613  |        39.9985         |
|             BartForCausalLM             |  4  |  5.554  |  10.9576  | 41.0576  |        39.0823         |
|             BertForMaskedLM             | 16  | 5.3705  |  10.8349  | 40.4476  |        39.8994         |
|           PegasusForCausalLM            | 32  | 5.7043  |  11.1692  | 39.6115  |        37.6959         |
|        BertForQuestionAnswering         | 16  | 5.1458  |  10.7396  | 39.3084  |        40.2795         |
|                CamemBert                | 16  | 5.1586  |  10.9326  |  39.15   |        37.6415         |
|            TrOCRForCausalLM             | 32  | 5.6998  |  11.1867  | 38.9148  |        37.1302         |
|             OPTForCausalLM              |  2  |  4.938  |  10.3307  | 38.8359  |        37.9769         |
|           RobertaForCausalLM            | 16  | 5.2823  |  11.0827  | 38.6419  |        37.3913         |
|      GPT2ForSequenceClassification      |  4  | 4.8864  |  9.9926   |  38.367  |        34.8988         |
|            AlbertForMaskedLM            |  4  | 2.3534  |  8.1908   | 37.5107  |         37.837         |
|       RobertaForQuestionAnswering       | 16  | 5.2696  |  11.1054  |  37.482  |        36.2571         |
|     DistilBertForQuestionAnswering      | 256 | 2.4935  |  5.3775   | 36.1821  |        35.7403         |
|          DistilBertForMaskedLM          | 128 | 2.4784  |  5.4571   | 34.5465  |        34.0197         |
|       AlbertForQuestionAnswering        |  4  |  2.182  |  8.1893   | 34.0462  |         34.162         |
|               DistillGPT2               | 16  | 2.4927  |    5.2    | 30.6764  |        27.4641         |
|       BlenderbotSmallForCausalLM        | 64  | 3.7724  |  7.5447   | 29.8954  |        28.7029         |
|            PLBartForCausalLM            |  8  | 3.0117  |  5.9858   | 26.9461  |        25.2946         |
|         Speech2Text2ForCausalLM         | 256 | 3.0097  |  5.7817   |  26.237  |         24.579         |
|          DebertaV2ForMaskedLM           |  1  | 15.4062 |  27.6759  |   nan    |        71.1275         |
|          BlenderbotForCausalLM          |  4  | 11.0686 |  22.1774  |   nan    |        68.3431         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.2714 |  27.2242  |   nan    |        67.7838         |
|       DebertaForQuestionAnswering       |  8  |  7.153  |  13.5063  |   nan    |        52.7257         |
|           DebertaForMaskedLM            |  4  | 7.5387  |  13.9941  |   nan    |        52.3108         |
|          AllenaiLongformerBase          |  4  | 9.7349  |  31.3115  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  1.3156  |         1.3147         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  1.2697  |         1.268          |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1962  |         1.195          |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.1782  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.1778  |         1.1724         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1562  |         1.2307         |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.0965  |         1.1346         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0902  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0902  |         1.1813         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0897  |         1.1368         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0605  |         1.1479         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0562  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.056   |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0532  |         1.0491         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  1.044   |         1.1152         |
|            YituTechConvBert             | 16  | 0.9999 |  0.9143   |  1.043   |         1.0411         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0104  |         1.0518         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9772  |         1.052          |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9753  |         0.9739         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.971   |         1.0642         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.9653  |         1.0962         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9444  |         0.9912         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9294  |         0.9749         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.9273  |         1.0307         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9162  |         0.9886         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.9161  |         0.9864         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.9157  |         1.0689         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.9136  |         1.0139         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9127  |         1.0018         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8872  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8855  |         0.9583         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.8808  |         0.9908         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8749  |         0.9803         |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.8421  |         0.9792         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8215  |         0.9119         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.8112  |         1.016          |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.7921  |         0.8779         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6659  |         0.8392         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |   nan    |         1.1526         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |   nan    |         0.9978         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9764   |   nan    |         0.9797         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |   nan    |         0.9665         |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8684   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.192  |  300.661  | 161.7439 |         161.53         |
|       AlbertForQuestionAnswering        |  4  | 263.9086 | 297.9911  | 160.2391 |        160.0212        |
|            XLNetLMHeadModel             |  8  | 281.4392 | 289.0657  | 154.5077 |        154.8445        |
|     PegasusForConditionalGeneration     | 32  | 139.2523 | 153.1417  | 111.8818 |        113.8316        |
|            TrOCRForCausalLM             | 32  | 138.8396 | 145.1014  | 108.8789 |        107.7146        |
|      MBartForConditionalGeneration      |  2  | 139.9491 | 143.9759  | 92.4361  |        93.7768         |
|      BartForConditionalGeneration       |  2  | 138.6718 | 156.4122  | 91.5705  |        98.9476         |
|    MegatronBertForQuestionAnswering     |  8  | 144.7797 | 147.6723  | 85.8577  |        87.1054         |
|            YituTechConvBert             | 16  | 127.284  | 130.8946  | 83.6041  |        84.0858         |
| BlenderbotSmallForConditionalGeneration | 64  | 112.8435 | 121.2541  | 80.3085  |        79.3339         |
|     MobileBertForQuestionAnswering      | 128 | 182.009  | 208.2018  | 80.0827  |        158.9923        |
|                CamemBert                | 16  | 119.9005 | 122.7511  | 77.0794  |        77.1728         |
|          MobileBertForMaskedLM          | 64  | 187.5047 | 211.9443  | 75.7901  |        161.3235        |
|            MBartForCausalLM             |  4  | 114.8787 | 118.6657  | 74.9266  |        74.8001         |
|             BartForCausalLM             |  4  | 115.1753 | 118.3037  |  73.647  |        73.1877         |
|     M2M100ForConditionalGeneration      | 16  | 125.2828 | 138.2315  | 72.5234  |         93.181         |
|     PLBartForConditionalGeneration      |  4  | 118.1756 | 123.0701  | 71.8043  |         71.419         |
|     DistilBertForQuestionAnswering      | 256 | 103.8852 | 104.5268  | 70.9255  |        71.3799         |
|           LayoutLMForMaskedLM           | 16  | 114.0295 | 116.9497  | 70.7904  |        70.4928         |
|            PLBartForCausalLM            |  8  | 116.3301 | 120.8456  | 70.4136  |        69.3209         |
|          DistilBertForMaskedLM          | 128 | 85.2708  |  89.0095  | 69.6351  |        68.5966         |
|             BertForMaskedLM             | 16  | 111.5229 | 114.2942  | 69.4011  |        69.6059         |
|             OPTForCausalLM              |  2  | 166.976  | 181.2522  | 69.2077  |        68.7734         |
|           RobertaForCausalLM            | 16  | 116.738  | 119.3711  |  69.09   |        69.0237         |
|               DistillGPT2               | 16  | 107.0698 | 110.5314  | 63.3978  |        62.2103         |
|       T5ForConditionalGeneration        |  4  | 106.5398 | 122.7264  | 60.1424  |        60.5216         |
|                 T5Small                 |  4  | 106.6542 | 123.0605  | 60.0665  |        60.4739         |
|           PegasusForCausalLM            | 32  | 71.0717  |  74.3411  | 57.6327  |        58.5804         |
|         MegatronBertForCausalLM         |  4  | 88.5252  |  93.8424  |  56.813  |        58.2671         |
|       ElectraForQuestionAnswering       | 64  | 116.072  | 117.1593  | 54.5465  |        54.7236         |
|        BertForQuestionAnswering         | 16  | 96.7417  |  97.8799  |  54.05   |        54.0273         |
|       RobertaForQuestionAnswering       | 16  |  97.253  |  98.3208  | 53.9874  |        54.3691         |
|    LayoutLMForSequenceClassification    | 16  | 99.1035  | 100.5917  | 53.6883  |        54.5383         |
|             XGLMForCausalLM             |  8  | 88.6869  | 110.5493  | 51.9949  |        80.0639         |
|           ElectraForCausalLM            | 32  | 89.6033  |  93.8731  | 48.1051  |        48.3335         |
|       BlenderbotSmallForCausalLM        | 64  | 58.9336  |  63.2927  | 46.6732  |        45.8814         |
|       MT5ForConditionalGeneration       | 16  |  92.691  | 108.6286  | 42.2554  |        50.0218         |
|      GPT2ForSequenceClassification      |  4  | 94.8344  |  96.2385  | 39.7866  |        40.1155         |
|         Speech2Text2ForCausalLM         | 256 | 54.3935  |  56.3026  | 35.4545  |        34.7608         |
|          DebertaV2ForMaskedLM           |  1  | 148.8464 | 197.9223  |   nan    |        170.6135        |
|      DebertaV2ForQuestionAnswering      |  2  | 150.5936 | 200.3212  |   nan    |        158.6319        |
|          BlenderbotForCausalLM          |  4  | 109.5908 | 128.8757  |   nan    |        88.9777         |
|       DebertaForQuestionAnswering       |  8  | 93.8209  | 108.1873  |   nan    |        79.5942         |
|           DebertaForMaskedLM            |  4  | 93.2457  | 120.7483  |   nan    |         75.725         |
|          AllenaiLongformerBase          |  4  | 180.9782 | 271.6727  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 0.9985 |  0.9646   |  0.3393  |         1.0027         |
|          mixer_b16_224          | 128 | 0.9973 |  1.0161   |  0.2702  |         1.3605         |
|            pit_b_224            | 64  | 0.9946 |  0.9927   |  0.2355  |         1.4284         |
|        tnt_s_patch16_224        | 128 | 0.9991 |  0.9966   |  0.2137  |         2.9752         |
|          gmlp_s16_224           | 128 | 0.9949 |  1.0829   |  0.2062  |         1.8317         |
|          gmixer_24_224          | 128 | 0.9954 |  0.8897   |  0.1907  |         1.7455         |
|           convit_base           | 64  | 0.9981 |  0.9978   |  0.1668  |         1.6096         |
|          resmlp_12_224          | 128 | 0.9934 |  0.8899   |  0.1366  |         1.2561         |
|           tf_mixnet_l           | 128 | 0.9762 |  0.8266   |  0.1242  |         1.1919         |
|       eca_botnext26ts_256       | 128 | 0.9742 |  0.7189   |  0.1222  |         1.4236         |
|            mixnet_l             | 128 | 0.9761 |   0.821   |  0.1215  |         1.1813         |
|         coat_lite_mini          | 128 | 0.9971 |  0.9958   |  0.1159  |         1.9194         |
|          botnet26t_256          | 128 | 0.9734 |  0.8509   |  0.1146  |         1.4243         |
|             dla102              | 128 | 0.9961 |  0.8154   |  0.1132  |         1.5228         |
|      beit_base_patch16_224      | 64  | 0.9965 |  0.9645   |  0.1105  |         1.352          |
|         visformer_small         | 128 | 0.9958 |  0.9449   |  0.1088  |         1.166          |
|       gluon_inception_v3        | 128 | 0.9963 |  0.8645   |  0.108   |         1.5207         |
|        adv_inception_v3         | 128 | 0.9962 |  0.8609   |  0.1079  |          1.52          |
|          inception_v3           | 128 | 0.9956 |  0.8646   |  0.1074  |         1.518          |
|           dm_nfnet_f0           | 128 | 0.9868 |  0.9855   |  0.1068  |         1.4297         |
|      vit_base_patch16_224       | 64  | 0.9962 |  0.9938   |  0.1043  |         1.2364         |
|            nfnet_l0             | 128 | 0.9897 |  0.8144   |  0.1027  |         1.4322         |
| deit_base_distilled_patch16_224 | 64  | 0.9963 |   0.994   |  0.0984  |         1.2549         |
|           res2next50            | 128 | 0.999  |  0.8251   |  0.0963  |         1.3622         |
|           volo_d1_224           | 64  | 0.9941 |  0.9736   |  0.0961  |         1.6659         |
|      xcit_large_24_p8_224       |  5  | 0.9932 |  0.8715   |  0.0931  |         1.569          |
|  swin_base_patch4_window7_224   | 64  | 0.9913 |  0.9544   |  0.0866  |         1.6064         |
|     swsl_resnext101_32x16d      | 32  | 0.9977 |  0.8405   |  0.083   |         1.0201         |
|            repvgg_a2            | 128 | 0.9363 |  0.7553   |  0.0786  |         1.1203         |
|          cspdarknet53           | 64  | 0.9314 |  0.7849   |  0.0778  |         1.2614         |
|         poolformer_m36          | 64  | 0.9863 |   0.983   |  0.0777  |         1.3187         |
|            gernet_l             | 128 | 0.9352 |  0.7937   |  0.0768  |         1.064          |
|           selecsls42b           | 128 | 0.9979 |  0.8115   |  0.0736  |         1.4096         |
|           resnest101e           | 64  | 0.9946 |  0.8671   |  0.0736  |         1.3571         |
|            hrnet_w18            | 128 | 0.9926 |  0.6364   |  0.0724  |         1.3531         |
|       tf_efficientnet_b0        | 128 | 0.9603 |  0.6818   |  0.0714  |         1.3845         |
|          pnasnet5large          | 16  | 0.986  |  0.9187   |   0.07   |         1.1277         |
|          jx_nest_base           | 32  | 0.9878 |  0.9859   |  0.0698  |         1.359          |
|          convnext_base          | 64  | 0.9836 |  0.9846   |  0.0677  |         1.4715         |
|         crossvit_9_240          | 128 | 0.9905 |  0.7828   |  0.0663  |         1.6154         |
|            fbnetv3_b            | 128 | 0.9488 |   0.769   |  0.0656  |         1.3173         |
|         mobilenetv2_100         | 128 | 0.9497 |  0.7372   |  0.0655  |         1.4443         |
|          cait_m36_384           |  4  | 0.9947 |   0.993   |  0.0652  |         1.3492         |
|        res2net50_14w_8s         | 128 | 0.9991 |  0.7905   |  0.0651  |         1.3573         |
|          ghostnet_100           | 128 | 0.992  |  0.7639   |  0.0635  |         1.5792         |
|             dpn107              | 32  | 0.9329 |  0.8081   |  0.0625  |         1.1365         |
|          spnasnet_100           | 128 | 0.9406 |  0.7386   |  0.0622  |         1.4189         |
|           rexnet_100            | 128 | 0.9521 |   0.703   |  0.0617  |         1.333          |
|        gluon_xception65         | 32  | 0.992  |  0.8413   |  0.0592  |         1.0784         |
|        twins_pcpvt_base         | 64  | 0.9944 |  0.9076   |  0.058   |         1.6655         |
|      mobilenetv3_large_100      | 128 | 0.9495 |  0.7603   |  0.0576  |         1.4381         |
|            tinynet_a            | 128 | 0.9459 |  0.6783   |  0.054   |         1.2577         |
|        ese_vovnet19b_dw         | 128 | 0.958  |  0.8322   |  0.0421  |         1.3696         |
|        res2net101_26w_4s        | 64  | 0.9995 |  0.7893   |  0.042   |         1.0624         |
|           regnety_002           | 128 |  0.95  |  0.7077   |  0.0399  |         1.2267         |
|            lcnet_050            | 128 | 0.9421 |  0.7365   |  0.0397  |         1.4641         |
|        sebotnet33ts_256         | 64  | 0.9577 |  0.7648   |  0.0394  |         1.5321         |
|           fbnetc_100            | 128 | 0.9498 |  0.7387   |  0.0359  |         1.404          |
|           mobilevit_s           | 64  | 0.9612 |  0.7316   |  0.0345  |         1.4426         |
|           mnasnet_100           | 128 | 0.9483 |  0.7407   |  0.033   |         1.4957         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|            hrnet_w18            | 128 | 9.6561  |  36.2197  |  241.07  |        249.5104        |
|           rexnet_100            | 128 | 5.6519  |  11.1755  | 231.1478 |        285.6805        |
|          ghostnet_100           | 128 | 7.6257  |  14.9647  | 201.641  |        234.8177        |
|          pnasnet5large          | 16  |  8.098  |  25.9353  | 160.1059 |        165.2232        |
|           resnest101e           | 64  | 11.103  |  24.1575  | 156.4844 |        162.731         |
|            fbnetv3_b            | 128 | 8.5622  |  16.9496  | 151.987  |        171.645         |
|           mobilevit_s           | 64  | 5.3374  |  11.3167  | 148.4526 |        158.3839        |
|        res2net101_26w_4s        | 64  | 10.7703 |  24.9149  | 147.9464 |        150.0525        |
|        twins_pcpvt_base         | 64  | 10.5111 |  23.0249  | 145.2525 |        147.0902        |
|           tf_mixnet_l           | 128 | 9.1465  |  16.7727  | 142.4357 |        160.9612        |
|        adv_inception_v3         | 128 | 5.6491  |  12.406   | 138.9232 |        159.3719        |
|            mixnet_l             | 128 | 8.4358  |  16.1646  | 138.4925 |        161.0833        |
|          inception_v3           | 128 | 5.7392  |  12.627   | 137.0328 |        154.704         |
|       gluon_inception_v3        | 128 | 5.7168  |  13.2092  | 136.8902 |        161.8541        |
|      xcit_large_24_p8_224       |  5  | 12.8135 |  28.6779  | 136.6948 |        131.2005        |
|      mobilenetv3_large_100      | 128 | 4.2418  |  8.3537   | 135.6454 |        159.1405        |
|            tinynet_a            | 128 | 6.0252  |  12.1851  | 134.1131 |        161.4224        |
|       tf_efficientnet_b0        | 128 | 5.1678  |  11.0443  | 130.4467 |        149.0824        |
|           fbnetc_100            | 128 | 4.9224  |  9.3382   | 122.4206 |        136.4893        |
|        res2net50_14w_8s         | 128 | 9.1005  |  22.259   | 122.3563 |        122.5245        |
|          cait_m36_384           |  4  | 13.7044 |  31.2429  | 121.5843 |        113.9006        |
|          spnasnet_100           | 128 | 5.0519  |  9.2159   | 116.3327 |        139.1139        |
|  swin_base_patch4_window7_224   | 64  | 8.5157  |  20.1709  | 113.8579 |        106.7069        |
|         mobilenetv2_100         | 128 | 4.0613  |  7.7847   | 112.2976 |        131.4004        |
|           mnasnet_100           | 128 | 4.0213  |   7.548   | 107.161  |        122.8089        |
|        sebotnet33ts_256         | 64  | 4.2282  |  8.8577   | 101.8107 |        108.4677        |
|         poolformer_m36          | 64  | 7.6863  |  13.7327  | 101.222  |        100.2883        |
|             dpn107              | 32  | 9.8232  |  19.2683  | 98.0487  |        100.9677        |
|        gluon_xception65         | 32  |  7.818  |  16.798   | 93.5592  |        96.0985         |
|           regnety_002           | 128 | 4.9114  |  9.1591   | 93.5154  |        106.8212        |
|             dla102              | 128 |  6.254  |  14.7159  | 93.1607  |        97.8443         |
|         coat_lite_mini          | 128 | 3.2308  |  7.8275   | 88.7031  |        88.1557         |
|          cspdarknet53           | 64  | 5.8461  |  10.8214  |  88.533  |        100.149         |
|          jx_nest_base           | 32  | 6.7474  |  14.3677  | 87.8927  |        82.8484         |
|         crossvit_9_240          | 128 | 5.8414  |  14.0077  | 87.1136  |        88.4144         |
|       eca_botnext26ts_256       | 128 | 3.0791  |  7.1727   | 84.4885  |        97.0277         |
|          botnet26t_256          | 128 | 2.9274  |  6.2258   | 83.9127  |        87.7853         |
|           res2next50            | 128 | 5.1058  |  11.9502  | 83.1947  |         87.188         |
|            lcnet_050            | 128 | 2.5247  |  5.0191   | 78.7988  |        99.7701         |
|           selecsls42b           | 128 | 2.4833  |  5.3594   | 78.1689  |        87.5077         |
|           volo_d1_224           | 64  | 5.1005  |  11.8616  | 75.4666  |        73.2499         |
|        tnt_s_patch16_224        | 128 | 6.5627  |  17.0153  | 74.7295  |        69.7394         |
|            nfnet_l0             | 128 | 5.3692  |  10.8786  | 73.8593  |        76.5888         |
|            gernet_l             | 128 | 5.0071  |  8.8503   | 73.3036  |        82.1011         |
|        ese_vovnet19b_dw         | 128 | 2.5929  |   4.566   | 71.6939  |        75.6467         |
|           dm_nfnet_f0           | 128 | 6.1449  |  11.3517  | 71.1326  |        73.4606         |
|          convnext_base          | 64  | 6.6566  |  12.4965  | 66.3668  |        58.3006         |
|         visformer_small         | 128 | 2.6368  |  6.0849   | 66.2539  |        67.2607         |
|     swsl_resnext101_32x16d      | 32  | 6.2116  |  13.5853  | 66.0539  |        62.5789         |
|          gmlp_s16_224           | 128 | 5.6388  |  11.9119  | 60.7029  |        60.0121         |
|            repvgg_a2            | 128 | 4.8709  |  8.6764   | 58.3885  |        59.0082         |
|          gmixer_24_224          | 128 |  5.718  |  12.7939  | 53.1998  |        53.0118         |
|           convit_base           | 64  | 3.5301  |  8.5498   | 52.1415  |        49.0148         |
|            pit_b_224            | 64  | 3.5063  |  7.8884   | 48.0108  |        46.1204         |
| deit_base_distilled_patch16_224 | 64  | 3.1116  |  7.0093   | 46.0638  |        43.3687         |
|      vit_base_patch16_224       | 64  | 3.0583  |   7.422   | 42.7212  |        39.0513         |
|          resmlp_12_224          | 128 | 2.8308  |  5.2342   | 42.7069  |        40.3159         |
|      beit_base_patch16_224      | 64  | 3.9286  |  9.1004   | 38.8582  |        34.7414         |
|        convmixer_768_32         | 32  | 1.7035  |  6.8316   | 37.9631  |        37.2696         |
|          mixer_b16_224          | 128 | 2.6446  |  5.8903   | 34.8958  |        33.3646         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.2872  |         1.2836         |
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.2057  |         1.2049         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  1.1899  |         1.1871         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1607  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.1583  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.1215  |         1.1179         |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  1.1129  |         1.1115         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  1.089   |         1.0876         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.0875  |         1.0845         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  1.0758  |         1.0721         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  1.0757  |         1.0728         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  1.0696  |         1.0675         |
|        twins_pcpvt_base         | 64  | 0.996  |  0.9235   |  1.0556  |         1.0539         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  1.0512  |         1.0506         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  1.0494  |         1.0457         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0377  |         1.0351         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  1.0361  |         1.0328         |
|          convnext_base          | 64  | 1.001  |   0.924   |  1.0345  |         1.0338         |
|             dla102              | 128 | 0.9634 |  0.9155   |  1.0323  |         1.0326         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  1.0251  |         1.0242         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  1.021   |         1.0202         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  1.0203  |         1.0194         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  1.0082  |         1.0072         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  1.0071  |         1.0057         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9976  |         0.9952         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9957  |         0.9948         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.9925  |          0.99          |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.9923  |         0.9902         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9917  |         0.9903         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.9912  |         0.9898         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9905  |         0.989          |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.9885  |         0.989          |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9864  |         0.9854         |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9821  |         0.9793         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.9793  |         0.9786         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.9793  |         0.977          |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.979   |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.9776  |         0.9732         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.9738  |         0.9706         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9732  |         0.9727         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.9714  |         0.9705         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.9702  |         0.9664         |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.966   |         0.9611         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.9646  |         0.9642         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.9637  |         0.9607         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.9611  |         0.9604         |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.9582  |         0.9535         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.9568  |         0.9547         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9562  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9537  |         0.9528         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.9509  |         0.9483         |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.9497  |         0.9451         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.9448  |         0.9403         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.9376  |         0.9361         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.9046  |         0.9045         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.901   |         0.8966         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.8898  |         0.884          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+-----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor  | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+-----------+------------------------+
|            hrnet_w18            | 128 | 281.5678 | 439.0538  | 3894.0004 |        206.3261        |
|          pnasnet5large          | 16  | 198.8187 | 212.8243  | 2829.504  |        174.0852        |
|          cait_m36_384           |  4  | 168.3334 |  167.814  | 2576.1756 |        123.6328        |
|        res2net101_26w_4s        | 64  | 99.9607  | 125.0446  | 2412.7024 |        93.7608         |
|           mobilevit_s           | 64  | 84.6242  | 111.3099  | 2372.3196 |        56.3724         |
|           resnest101e           | 64  | 165.3009 |  189.195  | 2235.7466 |        120.7226        |
|           fbnetc_100            | 128 | 82.8699  |  106.428  | 2203.7189 |        56.0102         |
|        res2net50_14w_8s         | 128 | 140.5367 | 177.5161  | 2172.3422 |        103.6699        |
|        twins_pcpvt_base         | 64  |  119.01  | 127.0446  | 2105.0079 |        69.6814         |
|        sebotnet33ts_256         | 64  | 80.4751  |  100.681  | 1963.0799 |        50.2763         |
|         poolformer_m36          | 64  | 146.8155 | 147.0494  | 1872.6823 |        109.7158        |
|           mnasnet_100           | 128 | 64.3803  |  82.2834  | 1854.3566 |        40.7755         |
|          convnext_base          | 64  | 124.4454 | 124.0273  | 1816.1637 |         83.042         |
|             dpn107              | 32  | 113.7044 | 131.2898  | 1710.0033 |        93.2511         |
|  swin_base_patch4_window7_224   | 64  | 147.2048 | 153.0247  | 1693.6682 |        91.1138         |
|        gluon_xception65         | 32  | 99.7125  | 117.5622  | 1682.216  |        91.6012         |
|            fbnetv3_b            | 128 | 115.6315 | 142.0907  | 1674.6531 |        83.1381         |
|           tf_mixnet_l           | 128 | 193.9521 |  228.984  | 1526.5816 |        158.7589        |
|             dla102              | 128 | 172.4621 | 210.6174  | 1523.3892 |        112.8289        |
|        tnt_s_patch16_224        | 128 | 323.4379 | 323.7833  | 1513.3426 |        108.4598        |
|          inception_v3           | 128 | 160.8381 | 185.0728  | 1495.201  |        105.7363        |
|            mixnet_l             | 128 | 185.7194 | 220.1815  | 1493.3697 |        153.256         |
|       gluon_inception_v3        | 128 | 160.743  | 185.5039  | 1488.3527 |        105.3243        |
|        adv_inception_v3         | 128 | 160.7878 | 185.8095  | 1486.6009 |        105.2991        |
|        ese_vovnet19b_dw         | 128 | 64.6311  |  74.3839  | 1475.8197 |        45.2062         |
|          jx_nest_base           | 32  | 101.526  | 101.3963  | 1444.1781 |        73.8238         |
|     swsl_resnext101_32x16d      | 32  | 118.607  |  141.048  | 1431.6615 |        116.3421        |
|          ghostnet_100           | 128 | 90.8267  |  117.658  | 1421.339  |        56.9956         |
|      xcit_large_24_p8_224       |  5  | 121.8544 | 141.1579  | 1404.5831 |        78.9704         |
|           res2next50            | 128 | 126.1954 | 152.6521  | 1313.0345 |        92.4866         |
|            tinynet_a            | 128 | 73.7801  | 102.6453  | 1298.1995 |        55.3114         |
|           volo_d1_224           | 64  | 120.8699 | 123.3153  | 1257.6367 |        72.1821         |
|           rexnet_100            | 128 |  80.12   | 108.4358  | 1244.3547 |        57.2076         |
|         crossvit_9_240          | 128 | 82.4594  | 104.6193  | 1239.6505 |        50.6321         |
|           dm_nfnet_f0           | 128 | 128.744  | 128.3765  | 1189.8238 |        88.7534         |
|       tf_efficientnet_b0        | 128 | 84.7651  | 119.8714  | 1145.3037 |        58.7826         |
|          cspdarknet53           | 64  | 95.0737  | 112.7797  | 1144.9722 |        70.3038         |
|            nfnet_l0             | 128 | 112.7198 | 136.7281  | 1094.2138 |        77.7959         |
|          spnasnet_100           | 128 | 70.5402  |  89.7049  | 1070.2216 |         46.761         |
|      mobilenetv3_large_100      | 128 | 61.4397  |  76.5427  | 1014.5535 |        40.4728         |
|           regnety_002           | 128 | 42.8374  |  56.8823  | 990.1442  |        29.9658         |
|           convit_base           | 64  | 163.2175 | 163.0129  | 978.9271  |        101.1057        |
|         coat_lite_mini          | 128 | 113.0291 | 113.1834  |  976.199  |         58.769         |
|         mobilenetv2_100         | 128 | 65.4947  |  84.388   | 955.0764  |        43.0699         |
|            gernet_l             | 128 |  77.648  |  91.5375  | 950.6863  |        68.3778         |
|            repvgg_a2            | 128 | 77.4754  |  96.1145  | 927.7921  |        64.8056         |
|      beit_base_patch16_224      | 64  | 101.4776 | 104.7822  | 920.5623  |        74.8479         |
|        convmixer_768_32         | 32  | 300.3915 | 311.0162  | 886.2581  |        299.2877        |
|       eca_botnext26ts_256       | 128 | 108.7311 |  147.221  | 868.8366  |        74.3539         |
|          botnet26t_256          | 128 | 101.7438 | 116.4666  | 868.0187  |         69.642         |
| deit_base_distilled_patch16_224 | 64  | 84.8465  |  84.9808  | 864.0672  |        67.3412         |
|         visformer_small         | 128 | 91.2747  |  96.0869  | 838.9885  |        77.9978         |
|      vit_base_patch16_224       | 64  | 86.7979  |  87.1971  | 834.5492  |         70.003         |
|           selecsls42b           | 128 |  60.044  |  73.7309  | 818.0287  |        42.6055         |
|            lcnet_050            | 128 |  31.694  |  40.4537  | 757.6308  |        20.4032         |
|          gmlp_s16_224           | 128 | 137.9542 | 126.2828  | 665.8973  |        74.6709         |
|          gmixer_24_224          | 128 | 117.9611 | 132.1317  | 618.4647  |        67.5194         |
|            pit_b_224            | 64  | 118.7668 | 118.9732  | 502.7092  |        82.6953         |
|          mixer_b16_224          | 128 | 116.5593 |  114.578  | 431.5792  |        86.1252         |
|          resmlp_12_224          | 128 | 53.6175  |  59.7108  | 390.1145  |        42.3084         |
+---------------------------------+-----+----------+-----------+-----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_092_02_04_23_performance_amp_540

Commit hashes

pytorch commit: 5d62d12
pytorch commit date: 2023-04-03 02:02:15+00:00
torchbench commit: ea7b71ead75529529d67ffd17541b1f203c49b83
torchbench commit date: 2023-03-31 18:05:58-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git5d62d12

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.58x    |    1.60x    |    1.41x    |
| inductor_no_cudagraphs |   1.25x    |    1.49x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.84    |    7.33     |    5.95     |
|       aot_eager        |    9.34    |    15.90    |    13.13    |
|        inductor        |   63.99    |    62.66    |   111.62    |
| inductor_no_cudagraphs |   64.29    |    59.57    |   110.74    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.79x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Previous report name: /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 80%, 48/60  | 85%, 51/60  |
|        inductor        | huggingface | 84%, 38/45  | 91%, 41/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.00x    |   1.58x   |
|        inductor        | huggingface |   1.66x    |   1.60x   |
|        inductor        | timm_models |   1.00x    |   1.41x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.25x   |
| inductor_no_cudagraphs | huggingface |   1.50x    |   1.49x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |         hf_Longformer         |   fail_to_run   |      fail_to_run       |
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |        vision_maskrcnn        | eager_variation |    eager_variation     |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |              gat              |     0.0000      |         0.0000         |
| torchbench  |              gcn              |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |             sage              |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |          pass          |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |           resnet18            |  1.584   |         0.9331         |
| torchbench  |             dcgan             |  1.4457  |         0.7866         |
| torchbench  |         lennard_jones         |  1.3512  |         0.8464         |
| torchbench  |       soft_actor_critic       |  1.1792  |         0.7225         |
| torchbench  |          tts_angular          |  0.9495  |         0.9507         |
| torchbench  |          timm_vovnet          |  0.9473  |         0.9146         |
| torchbench  |    nvidia_deeprecommender     |  0.8721  |         1.0188         |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0814         |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |  DebertaForQuestionAnswering  |  1.0607  |         0.9463         |
| huggingface |      DebertaForMaskedLM       |  0.9599  |         0.8065         |
| huggingface |     DebertaV2ForMaskedLM      |  0.8762  |         0.6445         |
| huggingface | DebertaV2ForQuestionAnswering |  0.8346  |         0.6461         |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.2533         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 173.7048 |        173.6517        |
| torchbench  |        phlippe_densenet        | 172.8243 |        167.2448        |
| torchbench  |       timm_efficientnet        | 150.2811 |        149.0838        |
| torchbench  |           hf_BigBird           | 149.4919 |        130.2542        |
| torchbench  |          densenet121           | 137.9448 |        137.5183        |
| torchbench  |       mobilenet_v3_large       | 136.3897 |        138.8646        |
| torchbench  |          mobilenet_v2          | 128.245  |        129.6864        |
| torchbench  |             yolov3             | 122.739  |        122.3116        |
| torchbench  | timm_vision_transformer_large  |   nan    |        126.3869        |
| huggingface |     MobileBertForMaskedLM      | 147.3095 |        146.1162        |
| huggingface | MobileBertForQuestionAnswering | 141.616  |        137.7687        |
| huggingface |      DebertaV2ForMaskedLM      | 137.2045 |        75.9579         |
| huggingface | M2M100ForConditionalGeneration | 135.1352 |        134.7893        |
| huggingface | DebertaV2ForQuestionAnswering  | 134.6896 |        71.8413         |
| huggingface |  MT5ForConditionalGeneration   | 134.4762 |        133.2813        |
| huggingface |        XGLMForCausalLM         | 133.7804 |        133.5987        |
| timm_models |           rexnet_100           | 281.387  |        283.5166        |
| timm_models |           hrnet_w18            | 251.4005 |        248.2693        |
| timm_models |          ghostnet_100          | 236.9784 |        235.9516        |
| timm_models |           fbnetv3_b            | 171.6077 |        171.8795        |
| timm_models |          resnest101e           | 169.9194 |        173.0331        |
| timm_models |          mobilevit_s           | 168.8541 |        167.7505        |
| timm_models |           tinynet_a            | 167.6446 |        165.2419        |
| timm_models |         pnasnet5large          | 167.5939 |        166.725         |
| timm_models |          tf_mixnet_l           | 166.758  |        159.7839        |
| timm_models |            mixnet_l            | 165.8352 |        159.4478        |
| timm_models |     mobilenetv3_large_100      | 158.8865 |        158.5923        |
| timm_models |       tf_efficientnet_b0       | 158.6848 |        158.6533        |
| timm_models |        adv_inception_v3        | 158.015  |        159.2732        |
| timm_models |          inception_v3          | 157.8166 |        157.7184        |
| timm_models |       gluon_inception_v3       | 157.3344 |        153.768         |
| timm_models |       res2net101_26w_4s        | 153.5111 |        154.8713        |
| timm_models |        twins_pcpvt_base        | 150.1201 |        151.6307        |
| timm_models |          spnasnet_100          | 139.5335 |        140.3958        |
| timm_models |           fbnetc_100           | 139.1382 |        135.1571        |
| timm_models |        mobilenetv2_100         | 138.708  |        136.0327        |
| timm_models |      xcit_large_24_p8_224      | 134.6173 |        134.8888        |
| timm_models |        res2net50_14w_8s        | 126.9463 |        126.6219        |
| timm_models |          mnasnet_100           | 122.5232 |        125.4858        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |                 yolov3                  |  0.8919  |         1.0111         |
| torchbench  |              hf_GPT2_large              |  0.8904  |         1.128          |
| torchbench  |            timm_efficientnet            |  0.8704  |         1.0062         |
| torchbench  |           speech_transformer            |  0.8651  |         0.869          |
| torchbench  |           shufflenet_v2_x1_0            |  0.8613  |         0.9649         |
| torchbench  |              timm_resnest               |  0.8604  |         0.966          |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.8835         |
| torchbench  |                resnet152                |   0.85   |         0.9421         |
| torchbench  |               timm_regnet               |  0.8493  |         0.9506         |
| torchbench  |           Background_Matting            |  0.8485  |         1.0406         |
| torchbench  |              hf_DistilBert              |  0.8476  |         0.9945         |
| torchbench  |               hf_T5_large               |  0.8201  |         1.168          |
| torchbench  |              pytorch_unet               |  0.8134  |         0.9308         |
| torchbench  |            phlippe_densenet             |  0.8058  |         0.8659         |
| torchbench  |                 hf_Bart                 |  0.7933  |         0.9173         |
| torchbench  |                resnet50                 |  0.7831  |         0.8851         |
| torchbench  |                  dcgan                  |  0.7821  |         0.9645         |
| torchbench  |                 demucs                  |  0.7731  |         0.9656         |
| torchbench  |              squeezenet1_1              |  0.7722  |         0.908          |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.8893         |
| torchbench  |               timm_vovnet               |  0.7529  |         0.8869         |
| torchbench  |             pytorch_struct              |  0.7277  |         0.7362         |
| torchbench  |           mobilenet_v3_large            |  0.7275  |         0.8715         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9808         |
| torchbench  |               mnasnet1_0                |  0.7144  |         0.8074         |
| torchbench  |               densenet121               |  0.7107  |         0.7979         |
| torchbench  |                 alexnet                 |  0.7091  |         0.9384         |
| torchbench  |               hf_BigBird                |  0.6968  |         1.1191         |
| torchbench  |             resnext50_32x4d             |  0.6674  |         0.772          |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.8931         |
| torchbench  |                   drq                   |  0.6379  |         0.9573         |
| torchbench  |            soft_actor_critic            |  0.6066  |         0.9973         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.7463         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.5904  |         0.6004         |
| torchbench  |                resnet18                 |  0.5395  |         0.6089         |
| torchbench  |              lennard_jones              |  0.5317  |         0.9997         |
| torchbench  |               hf_Reformer               |  0.4538  |         0.8022         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3169  |         0.3395         |
| huggingface |           PegasusForCausalLM            |  0.893   |         0.9864         |
| huggingface |          DistilBertForMaskedLM          |  0.8849  |         0.9624         |
| huggingface |            TrOCRForCausalLM             |  0.8836  |         0.9583         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8729  |         0.9803         |
| huggingface |     PegasusForConditionalGeneration     |  0.8689  |         1.0689         |
| huggingface |      MBartForConditionalGeneration      |  0.8672  |         1.0307         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         1.0139         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         1.0962         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8184  |         0.9119         |
| huggingface |         Speech2Text2ForCausalLM         |  0.789   |         0.8779         |
| huggingface |     M2M100ForConditionalGeneration      |  0.7651  |         0.9908         |
| huggingface |          MobileBertForMaskedLM          |  0.7473  |         1.016          |
| huggingface |             XGLMForCausalLM             |  0.7117  |         0.9792         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6569  |         0.8392         |
| huggingface |           DebertaForMaskedLM            |  0.5501  |         0.9978         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5197  |         0.9665         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.487   |         0.9802         |
| huggingface |       DebertaForQuestionAnswering       |  0.4601  |         1.1526         |
| timm_models |                hrnet_w18                |  0.8918  |          0.99          |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1115         |
| timm_models |              inception_v3               |  0.8904  |         1.0171         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0171         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0171         |
| timm_models |                 dpn107                  |  0.8833  |         0.9642         |
| timm_models |            gluon_xception65             |  0.8831  |         0.9705         |
| timm_models |              ghostnet_100               |  0.8807  |         0.977          |
| timm_models |              spnasnet_100               |  0.8786  |         0.9451         |
| timm_models |          mobilenetv3_large_100          |  0.877   |         0.9361         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1871         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0072         |
| timm_models |          xcit_large_24_p8_224           |  0.8721  |         0.9732         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9607         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9483         |
| timm_models |                mixnet_l                 |  0.8687  |         0.9902         |
| timm_models |               mnasnet_100               |  0.8683  |         0.9403         |
| timm_models |               res2next50                |  0.866   |         0.9547         |
| timm_models |              cait_m36_384               |  0.8632  |         0.989          |
| timm_models |               fbnetc_100                |  0.8596  |         0.9535         |
| timm_models |                pit_b_224                |  0.8578  |         1.0242         |
| timm_models |               selecsls42b               |  0.8576  |         0.9664         |
| timm_models |              convnext_base              |  0.8505  |         1.0338         |
| timm_models |                gernet_l                 |  0.8499  |         0.9706         |
| timm_models |         swsl_resnext101_32x16d          |  0.8461  |         0.9786         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0202         |
| timm_models |              botnet26t_256              |  0.8239  |         0.9779         |
| timm_models |                lcnet_050                |  0.805   |         0.884          |
| timm_models |                repvgg_a2                |  0.7738  |         0.9611         |
| timm_models |               regnety_002               |  0.7602  |         0.8966         |
| timm_models |             crossvit_9_240              |  0.7526  |         0.9898         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9045         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9604         |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/passrate_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/comp_time_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Performance speedup regressions

+------------------------+----------+-------------+------------+
|        compiler        |   name   | prev_status | cur_status |
+------------------------+----------+-------------+------------+
| inductor_no_cudagraphs | resnet18 |   0.9546    |   0.9331   |
+------------------------+----------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+--------------------+-------------+------------+
|        compiler        |        name        | prev_status | cur_status |
+------------------------+--------------------+-------------+------------+
|        inductor        | mobilenet_v3_large |  115.8424   |  136.3897  |
|        inductor        |    mobilenet_v2    |  108.8431   |  128.245   |
|        inductor        |       yolov3       |  106.2531   |  122.739   |
| inductor_no_cudagraphs |       yolov3       |  117.6742   |  122.3116  |
+------------------------+--------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+--------------------+-------------+------------+
| compiler |        name        | prev_status | cur_status |
+----------+--------------------+-------------+------------+
| inductor |       yolov3       |    1.037    |   0.8919   |
| inductor |   hf_GPT2_large    |   1.1278    |   0.8904   |
| inductor | timm_efficientnet  |   1.0128    |   0.8704   |
| inductor | shufflenet_v2_x1_0 |   0.9736    |   0.8613   |
| inductor |    timm_resnest    |    0.953    |   0.8604   |
| inductor |     resnet152      |   0.9445    |    0.85    |
| inductor |    timm_regnet     |   0.9527    |   0.8493   |
| inductor | Background_Matting |   1.0421    |   0.8485   |
| inductor |   hf_DistilBert    |   1.0011    |   0.8476   |
| inductor |    hf_T5_large     |   1.1687    |   0.8201   |
| inductor |    pytorch_unet    |   0.9306    |   0.8134   |
| inductor |      hf_Bart       |   0.9236    |   0.7933   |
| inductor |       dcgan        |   0.9645    |   0.7821   |
| inductor |       demucs       |   0.9675    |   0.7731   |
| inductor |   squeezenet1_1    |    0.909    |   0.7722   |
| inductor |       vgg16        |   0.9823    |   0.7227   |
| inductor |      alexnet       |   0.9434    |   0.7091   |
| inductor |        drq         |   1.0607    |   0.6379   |
| inductor | soft_actor_critic  |   1.1053    |   0.6066   |
| inductor |   lennard_jones    |   1.0687    |   0.5317   |
+----------+--------------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Accuracy regressions

+----------+-------------------------------+-------------+-------------+
| compiler |             name              | prev_status | cur_status  |
+----------+-------------------------------+-------------+-------------+
| inductor | DebertaV2ForQuestionAnswering |    pass     | fail_to_run |
+----------+-------------------------------+-------------+-------------+

Performance speedup regressions

+------------------------+-----------------------------+-------------+------------+
|        compiler        |            name             | prev_status | cur_status |
+------------------------+-----------------------------+-------------+------------+
| inductor_no_cudagraphs | DebertaForQuestionAnswering |   0.9507    |   0.9463   |
+------------------------+-----------------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+---------------------------------+-------------+------------+
| compiler |              name               | prev_status | cur_status |
+----------+---------------------------------+-------------+------------+
| inductor |       PegasusForCausalLM        |   0.9161    |   0.893    |
| inductor | PegasusForConditionalGeneration |   0.9157    |   0.8689   |
| inductor |  MBartForConditionalGeneration  |   0.9273    |   0.8672   |
| inductor |  BartForConditionalGeneration   |   0.9136    |   0.8456   |
| inductor |     MegatronBertForCausalLM     |   0.9653    |   0.845    |
+----------+---------------------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540

Compilation latency (sec) regressions

+----------+-----------------+-------------+------------+
| compiler |      name       | prev_status | cur_status |
+----------+-----------------+-------------+------------+
| inductor |  spnasnet_100   |  116.3327   |  139.5335  |
| inductor | mobilenetv2_100 |  112.2976   |  138.708   |
| inductor |   mnasnet_100   |   107.161   |  122.5232  |
+----------+-----------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+------------------------------+-------------+------------+
| compiler |             name             | prev_status | cur_status |
+----------+------------------------------+-------------+------------+
| inductor |          hrnet_w18           |   0.9925    |   0.8918   |
| inductor |       sebotnet33ts_256       |   1.1129    |   0.891    |
| inductor |      gluon_inception_v3      |   1.0193    |   0.8904   |
| inductor |         inception_v3         |   1.0193    |   0.8904   |
| inductor |       adv_inception_v3       |   1.0193    |   0.8904   |
| inductor |            dpn107            |   0.9646    |   0.8833   |
| inductor |       gluon_xception65       |   0.9714    |   0.8831   |
| inductor |         ghostnet_100         |   0.9793    |   0.8807   |
| inductor |         spnasnet_100         |   0.9497    |   0.8786   |
| inductor |    mobilenetv3_large_100     |   0.9376    |   0.877    |
| inductor |        poolformer_m36        |   1.1899    |   0.8768   |
| inductor |     eca_botnext26ts_256      |   1.0082    |   0.8738   |
| inductor |     xcit_large_24_p8_224     |   0.9776    |   0.8721   |
| inductor |       res2net50_14w_8s       |   0.9637    |   0.8712   |
| inductor |      res2net101_26w_4s       |   0.9509    |   0.871    |
| inductor |           mixnet_l           |   0.9923    |   0.8687   |
| inductor |         mnasnet_100          |   0.9448    |   0.8683   |
| inductor |          res2next50          |   0.9568    |   0.866    |
| inductor |         cait_m36_384         |   0.9885    |   0.8632   |
| inductor |          fbnetc_100          |   0.9582    |   0.8596   |
| inductor |          pit_b_224           |   1.0251    |   0.8578   |
| inductor |         selecsls42b          |   0.9702    |   0.8576   |
| inductor |        convnext_base         |   1.0345    |   0.8505   |
| inductor |           gernet_l           |   0.9738    |   0.8499   |
| inductor |    swsl_resnext101_32x16d    |   0.9793    |   0.8461   |
| inductor |        coat_lite_mini        |    1.021    |   0.8402   |
| inductor |        botnet26t_256         |    0.979    |   0.8239   |
| inductor |          repvgg_a2           |    0.966    |   0.7738   |
| inductor |         regnety_002          |    0.901    |   0.7602   |
| inductor |        crossvit_9_240        |   0.9912    |   0.7526   |
| inductor | swin_base_patch4_window7_224 |   0.9046    |   0.7214   |
| inductor |         jx_nest_base         |   0.9611    |   0.6693   |
+----------+------------------------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9673 |  0.9093   |  3.6599  |         1.2882         |
|           BERT_pytorch            |  16  | 0.9908 |   0.806   |  3.0906  |         1.9625         |
|            densenet121            |  4   | 0.9868 |  0.7186   |  2.7545  |         1.016          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9669 |  0.9076   |  2.4982  |         1.7258         |
|            hf_BigBird             |  2   | 0.9556 |  0.7752   |  2.4879  |         1.6312         |
|             hf_Albert             |  8   | 0.9949 |  0.9559   |  2.3394  |         2.2929         |
|            hf_T5_large            |  2   | 0.9764 |  0.8092   |  2.2468  |         1.7322         |
|               dlrm                | 1024 | 0.9456 |  0.8435   |  2.2137  |         1.1573         |
|         phlippe_densenet          | 128  | 0.985  |  0.7771   |  2.108   |         0.9671         |
|        mobilenet_v3_large         |  32  | 0.9888 |  0.7854   |  2.0508  |         1.1466         |
|              hf_Bert              |  4   | 0.9961 |  0.8391   |   2.0    |         1.4825         |
|              hf_Bart              |  4   | 0.9834 |   0.838   |  1.983   |         1.5279         |
|               hf_T5               |  8   | 0.9837 |  0.8499   |  1.9333  |         1.9826         |
|          phlippe_resnet           | 128  | 0.9811 |   0.761   |  1.8544  |         0.9518         |
|           squeezenet1_1           |  32  | 0.9897 |  0.9335   |  1.8542  |         1.266          |
|              hf_GPT2              |  4   | 0.9957 |  0.9552   |  1.7874  |         1.766          |
|          resnext50_32x4d          |  8   | 0.9817 |   0.718   |   1.75   |         0.9549         |
|            mnasnet1_0             |  32  | 0.9885 |  0.7358   |  1.7015  |         1.0504         |
|           hf_GPT2_large           |  4   | 0.9824 |  0.9717   |  1.6777  |         1.7379         |
|        shufflenet_v2_x1_0         | 128  | 0.9932 |  0.7529   |  1.6415  |         1.1503         |
|           hf_Bert_large           |  4   | 1.0001 |  0.8711   |  1.5969  |         1.5064         |
|        speech_transformer         |  32  | 0.9782 |  0.8268   |  1.5859  |         1.4748         |
|             resnet18              |  16  | 0.9867 |  0.7676   |  1.584   |         0.9331         |
|           timm_resnest            |  32  | 0.9932 |  0.8491   |  1.5696  |         1.4775         |
|      timm_vision_transformer      |  32  | 0.9838 |  0.8486   |  1.5503  |         1.3437         |
|           fastNLP_Bert            |  6   | 0.9899 |  0.8359   |  1.5447  |         1.4899         |
|            timm_nfnet             | 128  | 0.9859 |  0.9847   |  1.5412  |         1.4733         |
|           mobilenet_v2            |  96  | 0.9968 |  0.7772   |  1.5241  |         1.5023         |
| attention_is_all_you_need_pytorch | 256  | 0.9896 |  0.9115   |  1.5101  |         1.4886         |
|           hf_DistilBert           |  8   | 0.9807 |  0.9538   |  1.4803  |         1.4716         |
|          pytorch_struct           | 200  | 0.9369 |  0.7628   |  1.4726  |         1.0624         |
|               dcgan               |  32  | 0.8596 |  0.6945   |  1.4457  |         0.7866         |
|         timm_efficientnet         |  32  | 0.9385 |  0.6282   |  1.4299  |         1.0439         |
|           pytorch_unet            |  1   | 0.9961 |   0.205   |  1.3574  |         1.3527         |
|           lennard_jones           | 1000 | 0.8245 |  0.7252   |  1.3512  |         0.8464         |
|                drq                |  1   | 0.928  |  0.7474   |  1.3455  |         1.0184         |
|          LearningToPaint          |  96  | 0.9904 |  0.7735   |  1.317   |         1.0456         |
|          pytorch_stargan          |  16  | 0.9924 |  0.7785   |  1.2803  |         1.2375         |
|               vgg16               |  64  | 0.9994 |  0.9986   |  1.2407  |         1.2545         |
|            Super_SloMo            |  6   | 0.9972 |  0.1793   |  1.2328  |         1.2311         |
|             resnet152             |  32  | 0.9951 |  0.7662   |  1.226   |         0.9886         |
|        Background_Matting         |  4   | 0.9987 |  0.1366   |  1.2123  |         1.2083         |
|              yolov3               |  16  | 0.9961 |  0.8064   |  1.1967  |         1.1991         |
|             resnet50              |  32  | 0.9958 |  0.7735   |  1.1952  |         1.0467         |
|         soft_actor_critic         | 256  | 0.8426 |  0.6286   |  1.1792  |         0.7225         |
|            hf_Reformer            |  4   | 0.9859 |  0.9639   |  1.1387  |         1.057          |
|              alexnet              | 128  | 0.9985 |  0.9973   |  1.0869  |         1.135          |
|              demucs               |  4   | 0.9987 |  1.0006   |  1.0356  |         1.0386         |
|            timm_regnet            |  32  | 0.9166 |   0.771   |  1.0183  |         0.9681         |
|            tts_angular            |  64  | 0.9172 |  0.8841   |  0.9495  |         0.9507         |
|            timm_vovnet            |  32  | 0.8591 |  0.7044   |  0.9473  |         0.9146         |
|      nvidia_deeprecommender       | 256  | 0.999  |  0.9981   |  0.8721  |         1.0188         |
|   timm_vision_transformer_large   |  32  | 0.998  |    0.0    |   0.0    |         1.0814         |
|           hf_Longformer           |  2   | 1.0113 |  0.6871   |   0.0    |          0.0           |
|               moco                |  32  | 0.9374 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 27.108  |  55.6962  | 173.7048 |        173.6517        |
|         phlippe_densenet          | 128  | 3.2826  |  7.0259   | 172.8243 |        167.2448        |
|         timm_efficientnet         |  32  | 4.9802  |  10.1623  | 150.2811 |        149.0838        |
|            hf_BigBird             |  2   | 13.0359 |  37.643   | 149.4919 |        130.2542        |
|            densenet121            |  4   | 7.7749  |  18.5416  | 137.9448 |        137.5183        |
|        mobilenet_v3_large         |  32  | 3.4991  |  7.7369   | 136.3897 |        138.8646        |
|           mobilenet_v2            |  96  | 3.1576  |  7.0171   | 128.245  |        129.6864        |
|              yolov3               |  16  | 4.9404  |  10.8113  | 122.739  |        122.3116        |
|            mnasnet1_0             |  32  | 3.1626  |  6.8423   | 110.2663 |        109.8701        |
|             resnet152             |  32  | 9.1917  |  20.2961  | 108.1269 |        105.6766        |
|           hf_GPT2_large           |  4   | 14.8753 |  30.2002  | 107.1334 |        106.7978        |
|           timm_resnest            |  32  | 1.8227  |  3.9454   | 100.9493 |        102.112         |
|        shufflenet_v2_x1_0         | 128  | 3.5317  |   7.79    | 82.5748  |        82.4949         |
|        speech_transformer         |  32  | 6.1568  |   13.91   | 77.5739  |        78.5083         |
|            timm_regnet            |  32  | 6.9363  |  12.4135  | 76.5302  |        73.5564         |
|            timm_nfnet             | 128  | 5.9236  |  11.2534  | 75.2696  |         75.207         |
| attention_is_all_you_need_pytorch | 256  | 4.4356  |  11.1114  | 74.6453  |        75.2495         |
|        Background_Matting         |  4   | 3.0096  |  11.3561  | 70.6691  |        70.8666         |
|           BERT_pytorch            |  16  | 4.9719  |  11.7199  | 69.7203  |        69.3842         |
|             resnet50              |  32  | 3.2457  |  7.0952   | 66.6599  |        65.9858         |
|           hf_Bert_large           |  4   | 10.3195 |  21.502   | 65.9636  |        65.4373         |
|            timm_vovnet            |  32  | 3.6908  |  6.5586   | 65.4469  |        64.7981         |
|           pytorch_unet            |  1   | 1.5575  |  4.3987   | 63.1401  |        60.5167         |
|       functorch_dp_cifar10        |  64  | 1.2236  |  2.4236   | 56.1746  |        57.3213         |
|          resnext50_32x4d          |  8   | 3.2553  |  7.0413   |  54.718  |        53.2778         |
|      timm_vision_transformer      |  32  | 3.3773  |  7.3371   | 53.5898  |        52.9388         |
|               hf_T5               |  8   | 5.7483  |  12.8408  | 51.1359  |         50.761         |
|            hf_Reformer            |  4   | 4.1743  |  6.0613   | 49.5868  |        44.7414         |
|              hf_Bart              |  4   | 6.1721  |  13.7508  | 49.1359  |        50.4798         |
|           fastNLP_Bert            |  6   | 5.2879  |  11.3418  | 48.3336  |         49.879         |
|          pytorch_stargan          |  16  | 1.2213  |  3.2508   |  47.742  |        45.9781         |
|          LearningToPaint          |  96  | 1.4111  |  2.9205   | 46.4467  |        45.8793         |
|            Super_SloMo            |  6   | 2.7432  |  9.8242   | 45.3348  |         45.236         |
|             resnet18              |  16  | 1.3696  |  2.8994   | 44.7977  |        44.2685         |
|              hf_GPT2              |  4   | 4.7402  |  9.5623   | 41.4425  |        42.0879         |
|              hf_Bert              |  4   | 5.1001  |  10.6534  | 39.3797  |        40.3765         |
|             hf_Albert             |  8   |  2.486  |  8.0383   | 39.2451  |        41.0514         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2136  |  2.9839   | 38.2521  |        35.8077         |
|          phlippe_resnet           | 128  | 1.3737  |  2.8481   | 33.6472  |        31.5959         |
|              demucs               |  4   | 1.4506  |  2.1854   | 32.5397  |        32.1505         |
|           hf_DistilBert           |  8   |  2.384  |  5.2764   | 31.5893  |        31.6892         |
|           squeezenet1_1           |  32  | 1.0707  |  1.7571   | 25.1384  |        24.9218         |
|          pytorch_struct           | 200  | 0.7502  |  1.3273   | 22.5763  |        20.6858         |
|               vgg16               |  64  | 0.6341  |  1.1299   | 17.9347  |        16.8421         |
|              alexnet              | 128  | 0.4868  |  0.7824   | 16.4443  |        15.9499         |
|                drq                |  1   | 0.6699  |  1.0183   | 11.4835  |        10.7938         |
|      nvidia_deeprecommender       | 256  | 0.4902  |  0.7644   | 11.3753  |         9.369          |
|               dcgan               |  32  | 0.4357  |   0.72    |  9.4082  |         8.9342         |
|         soft_actor_critic         | 256  | 0.4327  |  0.6129   |  8.5122  |         8.3882         |
|               dlrm                | 1024 |  0.379  |  0.7909   |  7.7453  |         8.4434         |
|           lennard_jones           | 1000 | 0.3939  |  0.5957   |  7.1429  |         5.7335         |
|            tts_angular            |  64  |  0.452  |  0.5155   |  6.9091  |         6.9628         |
|   timm_vision_transformer_large   |  32  | 9.5837  |    nan    |   nan    |        126.3869        |
|           hf_Longformer           |  2   | 9.5919  |  30.8672  |   nan    |          nan           |
|               moco                |  32  | 33.6948 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2557         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9866 |  0.7652   |   1.01   |         1.1012         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9895  |         0.9983         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|            timm_nfnet             | 128  | 0.907  |  0.8747   |  0.9686  |         1.0728         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.9422  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9321  |         1.0713         |
|              yolov3               |  16  | 0.9838 |   0.846   |  0.8919  |         1.0111         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8904  |         1.128          |
|         timm_efficientnet         |  32  | 0.9858 |  0.8201   |  0.8704  |         1.0062         |
|        speech_transformer         |  32  | 0.9914 |   0.901   |  0.8651  |         0.869          |
|        shufflenet_v2_x1_0         | 128  | 0.955  |  0.8387   |  0.8613  |         0.9649         |
|           timm_resnest            |  32  | 0.9857 |  0.8935   |  0.8604  |         0.966          |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|             resnet152             |  32  | 0.9953 |  0.8944   |   0.85   |         0.9421         |
|            timm_regnet            |  32  | 0.9902 |  0.8514   |  0.8493  |         0.9506         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  0.8485  |         1.0406         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9945         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.7933  |         0.9173         |
|             resnet50              |  32  | 0.9928 |  0.8617   |  0.7831  |         0.8851         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|              demucs               |  4   | 0.9661 |  0.9659   |  0.7731  |         0.9656         |
|           squeezenet1_1           |  32  | 0.9666 |  0.9321   |  0.7722  |         0.908          |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7277  |         0.7362         |
|        mobilenet_v3_large         |  32  | 0.9765 |  0.8395   |  0.7275  |         0.8715         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|            mnasnet1_0             |  32  | 0.9753 |  0.8651   |  0.7144  |         0.8074         |
|            densenet121            |  4   | 0.994  |   0.98    |  0.7107  |         0.7979         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.7091  |         0.9384         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6968  |         1.1191         |
|          resnext50_32x4d          |  8   | 0.995  |  0.8424   |  0.6674  |         0.772          |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9202 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8568   |  0.5904  |         0.6004         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.5395  |         0.6089         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |   0.893   |   nan    |          nan           |
|               moco                |  32  | 0.9982 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.8823 | 214.8084  | 124.6871 |        120.2118        |
|        Background_Matting         |  4   | 125.7389 | 919.3502  | 103.7394 |        103.9751        |
|            hf_T5_large            |  2   | 228.6345 |  275.016  | 101.7038 |        128.4146        |
|               hf_T5               |  8   | 181.9615 | 210.8074  | 93.9856  |        90.6495         |
|            timm_nfnet             | 128  | 119.9943 |  119.725  | 77.0516  |        80.3277         |
|            hf_BigBird             |  2   | 203.6832 | 251.7438  | 76.9383  |        120.658         |
|            hf_Reformer            |  4   | 82.0085  |  83.8966  |  71.004  |        76.6255         |
|            Super_SloMo            |  6   | 79.6755  | 443.5296  | 64.3412  |        64.4258         |
|              yolov3               |  16  | 68.7152  |  84.7362  | 57.1879  |        57.2268         |
|            timm_regnet            |  32  |  60.76   |  72.245   | 54.7574  |        57.7424         |
|               vgg16               |  64  | 66.2552  |  66.2583  | 53.3772  |        52.7964         |
|             resnet152             |  32  | 65.1844  |  83.8808  | 52.7869  |        70.4311         |
|           hf_Bert_large           |  4   | 82.8117  |  94.602   | 51.9431  |        55.5269         |
|              demucs               |  4   | 53.7216  |  53.3854  | 51.7245  |        51.6898         |
|        speech_transformer         |  32  | 66.0306  |  76.6459  | 46.3395  |        38.7323         |
|           fastNLP_Bert            |  6   |  54.56   |  62.895   | 36.7595  |        34.8994         |
| attention_is_all_you_need_pytorch | 256  | 55.8857  |  59.4564  | 36.1971  |        36.4787         |
|              hf_Bart              |  4   | 59.6665  |  70.0876  | 36.0931  |        37.8921         |
|           mobilenet_v2            |  96  | 47.1488  |  60.3547  | 30.7958  |        31.3324         |
|           pytorch_unet            |  1   | 39.9686  | 194.1787  | 29.3306  |        29.4115         |
|             hf_Albert             |  8   |  68.598  |  71.4819  | 29.1418  |         29.719         |
|              hf_GPT2              |  4   | 49.8868  |  50.3557  | 27.1906  |        27.6258         |
|            timm_vovnet            |  32  | 28.8574  |  35.9814  | 26.2594  |        26.8713         |
|              hf_Bert              |  4   | 41.1579  |  48.6898  | 22.5903  |        27.8524         |
|         timm_efficientnet         |  32  | 34.0671  |  50.6208  |   22.1   |        30.6613         |
|             resnet50              |  32  | 26.8274  |  34.2949  | 22.0293  |        25.3257         |
|           hf_DistilBert           |  8   | 32.0874  |  32.7852  | 21.5919  |        21.3191         |
|            densenet121            |  4   | 55.7531  |  91.7288  | 19.8369  |        54.6562         |
|        shufflenet_v2_x1_0         | 128  | 31.1064  |  40.8479  |  18.696  |        26.7172         |
|      timm_vision_transformer      |  32  |  29.831  |  34.4799  | 18.2555  |        21.2657         |
|           BERT_pytorch            |  16  | 57.3307  |  68.3942  | 17.7195  |        28.2045         |
|           timm_resnest            |  32  | 24.2188  |  28.3605  | 15.2566  |        16.2303         |
|            mnasnet1_0             |  32  | 22.7923  |  30.416   | 13.2507  |        20.7518         |
|        mobilenet_v3_large         |  32  | 28.6209  |  34.3221  | 13.0973  |        22.9508         |
|          resnext50_32x4d          |  8   | 21.0155  |  28.4556  | 11.8228  |        21.5592         |
|      nvidia_deeprecommender       | 256  |  10.217  |  10.2251  | 11.7006  |        10.0206         |
|          pytorch_stargan          |  16  | 15.0921  |  19.5954  | 11.6217  |        11.8958         |
|         phlippe_densenet          | 128  | 23.4226  |  29.8528  | 11.2471  |        24.3261         |
|              alexnet              | 128  |  9.8254  |  9.8493   |  9.0197  |         8.6431         |
|          LearningToPaint          |  96  | 11.5386  |  14.5768  |  8.5812  |        11.0022         |
|            tts_angular            |  64  |  6.8169  |  7.0812   |  6.4972  |         6.6601         |
|             resnet18              |  16  |  9.4913  |  12.1286  |  5.7877  |         9.9391         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.4984  |  18.7592  |  5.7524  |         8.7429         |
|           squeezenet1_1           |  32  | 12.0773  |  11.0602  |  5.3988  |         8.1757         |
|          phlippe_resnet           | 128  |  9.2098  |  11.7681  |  5.0631  |         9.6862         |
|          pytorch_struct           | 200  |  5.088   |   6.031   |  3.2284  |         4.4245         |
|       functorch_dp_cifar10        |  64  | 10.4867  |  11.0252  |  2.8361  |         7.951          |
|                drq                |  1   |  3.8045  |  4.3993   |  2.8072  |         3.3186         |
|               dlrm                | 1024 |  4.3363  |  4.9434   |  2.1579  |         3.6478         |
|               dcgan               |  32  |  2.4174  |  3.0627   |  1.4717  |         2.7099         |
|         soft_actor_critic         | 256  |  1.8347  |  2.5021   |  1.348   |         3.1701         |
|           lennard_jones           | 1000 |  1.8066  |  2.1293   |  1.1367  |         1.825          |
|   timm_vision_transformer_large   |  32  | 464.6951 |    nan    |   nan    |        428.469         |
|           hf_Longformer           |  2   | 113.2762 | 165.5889  |   nan    |          nan           |
|               moco                |  32  | 54.4848  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9876 |  0.9275   |  2.4643  |         2.4921         |
|          MobileBertForMaskedLM          | 64  | 0.9506 |  0.8064   |  2.3677  |         1.0135         |
|      GPT2ForSequenceClassification      |  4  | 0.9746 |  0.9498   |  2.2556  |         2.2832         |
|             XGLMForCausalLM             |  8  | 0.9881 |  0.8436   |  2.1222  |         1.381          |
|       ElectraForQuestionAnswering       | 64  | 0.9866 |  0.9768   |  2.1196  |         2.0866         |
|       MT5ForConditionalGeneration       | 16  | 0.9873 |  0.8369   |  2.1107  |         1.8064         |
|     MobileBertForQuestionAnswering      | 128 | 0.9567 |  0.8028   |  2.105   |         1.0164         |
|     M2M100ForConditionalGeneration      | 16  | 0.9845 |  0.8175   |  1.9317  |         1.5061         |
|           ElectraForCausalLM            | 32  | 0.9813 |  0.9367   |  1.8486  |         1.8147         |
|            XLNetLMHeadModel             |  8  | 0.9954 |  0.9637   |  1.8041  |         1.8084         |
|    LayoutLMForSequenceClassification    | 16  | 0.9835 |  0.9701   |  1.7889  |         1.7876         |
|       RobertaForQuestionAnswering       | 16  | 0.9843 |  0.9694   |  1.7871  |         1.7565         |
|        BertForQuestionAnswering         | 16  | 0.9843 |  0.9698   |  1.7753  |         1.761          |
|           RobertaForCausalLM            | 16  | 0.9865 |  0.9624   |  1.6797  |         1.6672         |
|               DistillGPT2               | 16  | 0.9865 |  0.9542   |  1.6583  |         1.7004         |
|            PLBartForCausalLM            |  8  | 0.9854 |  0.9604   |  1.6572  |         1.6379         |
|       AlbertForQuestionAnswering        |  4  | 0.9999 |  0.8856   |  1.648   |         1.6441         |
|            AlbertForMaskedLM            |  4  | 0.9996 |  0.8846   |  1.6393  |         1.6376         |
|       T5ForConditionalGeneration        |  4  | 0.9795 |  0.8461   |  1.6296  |         1.7213         |
|                 T5Small                 |  4  | 0.9771 |   0.846   |  1.6294  |         1.7202         |
|     PLBartForConditionalGeneration      |  4  | 0.9828 |  0.9462   |  1.6245  |         1.6514         |
|    MegatronBertForQuestionAnswering     |  8  |  0.98  |  0.9606   |  1.6043  |         1.6299         |
|             BertForMaskedLM             | 16  | 0.9858 |  0.9612   |  1.5928  |         1.5838         |
|           LayoutLMForMaskedLM           | 16  | 0.9856 |  0.9617   |  1.581   |         1.604          |
|                CamemBert                | 16  | 0.9872 |  0.9628   |  1.5456  |         1.5339         |
|         Speech2Text2ForCausalLM         | 256 | 0.9798 |  0.9136   |  1.5206  |         1.5598         |
|      BartForConditionalGeneration       |  2  | 0.9961 |  0.9621   |  1.5156  |         1.4899         |
|             BartForCausalLM             |  4  | 0.9779 |  0.9468   |   1.51   |         1.5431         |
|            YituTechConvBert             | 16  | 0.9858 |  0.9584   |  1.5091  |         1.4913         |
|            MBartForCausalLM             |  4  | 0.9765 |  0.9532   |  1.5088  |         1.5428         |
|      MBartForConditionalGeneration      |  2  | 1.0024 |  0.9665   |  1.4877  |         1.4651         |
|         MegatronBertForCausalLM         |  4  | 0.9871 |   0.915   |  1.4693  |         1.4713         |
|     DistilBertForQuestionAnswering      | 256 | 0.9936 |  0.9872   |  1.447   |         1.4465         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9944 |  0.9037   |  1.3946  |         1.4008         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9786 |  0.9102   |  1.3841  |         1.2639         |
|     PegasusForConditionalGeneration     | 32  | 0.9998 |  0.9306   |  1.3048  |         1.2796         |
|            TrOCRForCausalLM             | 32  | 0.9852 |  0.9535   |  1.2582  |         1.2856         |
|          DistilBertForMaskedLM          | 128 | 0.9919 |  0.9503   |  1.2081  |         1.2331         |
|           PegasusForCausalLM            | 32  | 0.9764 |  0.9265   |  1.2025  |         1.2087         |
|       DebertaForQuestionAnswering       |  8  | 0.7969 |   0.669   |  1.0607  |         0.9463         |
|           DebertaForMaskedLM            |  4  | 0.7394 |  0.5542   |  0.9599  |         0.8065         |
|          DebertaV2ForMaskedLM           |  1  | 0.6814 |  0.5215   |  0.8762  |         0.6445         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6846 |   0.52    |  0.8346  |         0.6461         |
|          BlenderbotForCausalLM          |  4  | 0.9725 |  0.8413   |   0.0    |         1.2533         |
|          AllenaiLongformerBase          |  4  | 1.0044 |  0.6723   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 17.4182 |  40.6649  | 147.3095 |        146.1162        |
|     MobileBertForQuestionAnswering      | 128 | 17.9642 |  40.2628  | 141.616  |        137.7687        |
|          DebertaV2ForMaskedLM           |  1  | 15.5026 |  27.4442  | 137.2045 |        75.9579         |
|     M2M100ForConditionalGeneration      | 16  | 12.823  |  26.6944  | 135.1352 |        134.7893        |
|      DebertaV2ForQuestionAnswering      |  2  | 15.5118 |  27.5733  | 134.6896 |        71.8413         |
|       MT5ForConditionalGeneration       | 16  | 8.1895  |  18.9708  | 134.4762 |        133.2813        |
|             XGLMForCausalLM             |  8  | 9.6905  |  20.8229  | 133.7804 |        133.5987        |
|            XLNetLMHeadModel             |  8  | 10.5825 |  27.7726  | 95.1735  |        94.0771         |
|       DebertaForQuestionAnswering       |  8  | 7.2795  |  13.7628  | 87.9834  |        52.9927         |
|      MBartForConditionalGeneration      |  2  | 11.7855 |  26.3408  | 79.7574  |        78.8409         |
|           DebertaForMaskedLM            |  4  | 7.3794  |  13.8882  | 79.5636  |        56.4173         |
|      BartForConditionalGeneration       |  2  | 11.8419 |  26.2361  | 74.4089  |        74.1057         |
|            YituTechConvBert             | 16  | 7.2304  |  15.9927  | 69.9848  |        68.1703         |
|     PegasusForConditionalGeneration     | 32  | 5.4617  |  19.4886  | 68.7984  |        67.6025         |
|         MegatronBertForCausalLM         |  4  | 10.6092 |  22.0004  | 67.9794  |         66.984         |
|    MegatronBertForQuestionAnswering     |  8  | 10.5375 |  22.0103  | 67.7764  |        67.4939         |
| BlenderbotSmallForConditionalGeneration | 64  |  8.164  |  17.4603  | 54.6984  |         54.658         |
|           ElectraForCausalLM            | 32  | 5.3267  |  11.0248  | 54.2442  |        52.5886         |
|       T5ForConditionalGeneration        |  4  | 5.6478  |  12.7409  | 51.5118  |        50.5222         |
|                 T5Small                 |  4  |  5.691  |  13.0159  | 51.3691  |        50.6253         |
|     PLBartForConditionalGeneration      |  4  | 6.2278  |  13.3847  | 49.4083  |        47.3389         |
|    LayoutLMForSequenceClassification    | 16  | 5.5582  |  11.2692  | 48.9066  |        47.1646         |
|       ElectraForQuestionAnswering       | 64  | 5.2761  |  10.9929  | 48.0001  |         44.596         |
|           LayoutLMForMaskedLM           | 16  | 5.6428  |  11.2675  | 42.4938  |        41.9945         |
|             BertForMaskedLM             | 16  | 5.3076  |  10.9149  | 41.8548  |        38.5289         |
|            MBartForCausalLM             |  4  | 5.6886  |  11.2335  | 40.8688  |        41.3408         |
|             OPTForCausalLM              |  2  | 4.8297  |  10.2349  | 38.9356  |        38.0661         |
|        BertForQuestionAnswering         | 16  | 5.2776  |  10.8427  | 38.8189  |        39.1384         |
|           PegasusForCausalLM            | 32  | 5.8173  |  11.0712  | 38.4069  |         38.167         |
|           RobertaForCausalLM            | 16  | 5.2323  |  10.9524  |  38.368  |        38.0287         |
|     DistilBertForQuestionAnswering      | 256 | 2.4947  |  5.3088   | 38.3324  |         36.505         |
|            TrOCRForCausalLM             | 32  | 5.5925  |  11.0819  | 38.0217  |        38.5092         |
|             BartForCausalLM             |  4  | 5.8097  |  11.1903  | 38.0108  |        38.0653         |
|                CamemBert                | 16  |  5.291  |  11.1073  | 37.7499  |        38.3678         |
|            AlbertForMaskedLM            |  4  | 2.2981  |  8.2482   | 37.4323  |        37.5439         |
|          DistilBertForMaskedLM          | 128 | 2.5461  |  5.5533   | 37.3067  |        35.3514         |
|       RobertaForQuestionAnswering       | 16  | 5.4478  |  10.7212  | 37.0719  |        36.8198         |
|      GPT2ForSequenceClassification      |  4  | 4.8716  |  9.9616   |  35.141  |        35.9794         |
|       AlbertForQuestionAnswering        |  4  | 2.3662  |  8.1812   | 33.5503  |         33.576         |
|       BlenderbotSmallForCausalLM        | 64  | 3.9288  |  7.6079   | 29.6073  |        29.8003         |
|               DistillGPT2               | 16  | 2.5637  |  5.1419   | 27.6486  |        28.9747         |
|         Speech2Text2ForCausalLM         | 256 | 3.1544  |  5.7332   | 27.3912  |        26.7527         |
|            PLBartForCausalLM            |  8  | 3.1462  |  5.8942   | 26.6379  |        27.2665         |
|          BlenderbotForCausalLM          |  4  | 11.2556 |  21.7936  |   nan    |        68.5996         |
|          AllenaiLongformerBase          |  4  |  9.806  |  31.442   |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.094   |         1.1346         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  1.0402  |         1.0411         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9731  |         0.9739         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9649  |         1.052          |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9281  |         0.9912         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9138  |         0.9886         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9137  |         0.9749         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0018         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.893   |         0.9864         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8836  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.8689  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8184  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.789   |         0.8779         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7651  |         0.9908         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7117  |         0.9792         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9764   |  0.487   |         0.9802         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1526         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8684   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.0232 | 300.6829  | 162.4158 |        162.6707        |
|       AlbertForQuestionAnswering        |  4  | 263.8267 | 297.8012  | 160.2736 |        160.6978        |
|            XLNetLMHeadModel             |  8  | 280.3565 | 289.2437  | 154.5535 |        154.1471        |
|      DebertaV2ForQuestionAnswering      |  2  | 157.5091 |  204.927  | 129.2886 |        167.7473        |
|          DebertaV2ForMaskedLM           |  1  | 153.0538 | 199.4217  | 121.5289 |        162.8963        |
|     PegasusForConditionalGeneration     | 32  | 143.9854 |  149.639  | 113.0499 |        108.4366        |
|            TrOCRForCausalLM             | 32  | 139.5866 | 143.7873  | 109.8276 |        107.4047        |
|      MBartForConditionalGeneration      |  2  | 143.4315 | 142.9796  | 97.3784  |        94.9903         |
|      BartForConditionalGeneration       |  2  | 151.3127 | 143.8897  | 96.0498  |        93.4241         |
|    MegatronBertForQuestionAnswering     |  8  | 144.8812 | 147.7114  | 88.3925  |        87.0279         |
|            YituTechConvBert             | 16  | 126.9944 | 130.4924  | 83.0943  |        84.0883         |
| BlenderbotSmallForConditionalGeneration | 64  | 124.4791 | 124.3649  | 81.2084  |        79.5059         |
|     MobileBertForQuestionAnswering      | 128 | 203.0359 | 208.9676  | 80.8469  |        165.5411        |
|                CamemBert                | 16  | 119.8231 | 122.9608  | 76.6063  |        77.1826         |
|     M2M100ForConditionalGeneration      | 16  | 146.8909 | 145.6144  |  76.528  |        98.5775         |
|            MBartForCausalLM             |  4  | 117.2294 | 118.9277  |  75.696  |        73.5726         |
|             BartForCausalLM             |  4  | 116.9502 | 120.6882  | 75.3615  |        74.0283         |
|     PLBartForConditionalGeneration      |  4  | 123.7816 | 123.0127  | 73.2478  |        72.0198         |
|          MobileBertForMaskedLM          | 64  | 188.316  | 215.5029  | 72.8991  |        167.2447        |
|     DistilBertForQuestionAnswering      | 256 | 103.993  | 104.6237  | 71.7417  |        71.6882         |
|       DebertaForQuestionAnswering       |  8  | 95.0973  |  113.126  | 71.5069  |        79.9566         |
|           LayoutLMForMaskedLM           | 16  | 114.1151 | 116.8221  | 71.2667  |        70.1172         |
|            PLBartForCausalLM            |  8  | 115.7589 | 119.5368  | 70.5294  |        69.3109         |
|          DistilBertForMaskedLM          | 128 | 85.3007  |  89.0486  | 70.0974  |        68.5643         |
|             BertForMaskedLM             | 16  | 111.5465 | 114.3174  | 69.0063  |        69.3962         |
|             OPTForCausalLM              |  2  | 170.1735 | 181.4036  | 68.7626  |        67.9898         |
|           RobertaForCausalLM            | 16  | 116.557  | 119.3617  | 68.4655  |        69.0163         |
|           DebertaForMaskedLM            |  4  | 95.1213  |  109.853  | 65.2519  |        79.1553         |
|       T5ForConditionalGeneration        |  4  | 106.6077 | 123.5051  | 64.2717  |        60.5381         |
|                 T5Small                 |  4  | 106.7658 | 123.5689  | 64.2566  |        60.4819         |
|               DistillGPT2               | 16  | 107.1456 | 110.7483  | 63.7138  |        62.1656         |
|         MegatronBertForCausalLM         |  4  | 88.5585  |  95.7059  | 59.2946  |        58.9336         |
|           PegasusForCausalLM            | 32  | 76.6631  |  74.8788  | 58.7832  |        57.2004         |
|             XGLMForCausalLM             |  8  | 94.4593  | 110.4114  | 55.3293  |        85.8465         |
|    LayoutLMForSequenceClassification    | 16  | 99.2745  |  100.564  | 54.5926  |        54.6921         |
|       ElectraForQuestionAnswering       | 64  | 116.1624 | 117.2364  | 54.2303  |        54.8085         |
|        BertForQuestionAnswering         | 16  | 96.7183  |  98.0425  | 53.6445  |        54.1136         |
|       RobertaForQuestionAnswering       | 16  | 97.1611  |  98.5359  | 53.4748  |        54.4432         |
|           ElectraForCausalLM            | 32  | 89.7788  |  93.9164  | 47.7411  |        48.3453         |
|       BlenderbotSmallForCausalLM        | 64  | 67.5919  |  63.4981  | 47.7224  |        45.9287         |
|       MT5ForConditionalGeneration       | 16  | 95.5908  | 112.2048  |  44.271  |         51.304         |
|      GPT2ForSequenceClassification      |  4  | 93.8139  |  96.3416  | 40.5883  |        40.1584         |
|         Speech2Text2ForCausalLM         | 256 | 54.0581  |  57.893   | 34.8659  |        34.4679         |
|          BlenderbotForCausalLM          |  4  | 112.3875 | 130.5071  |   nan    |        88.4962         |
|          AllenaiLongformerBase          |  4  | 180.367  | 271.3525  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9978 |  0.9969   |  3.012   |         2.972          |
|      xcit_large_24_p8_224       |  5  | 0.9903 |  0.8653   |  2.0031  |         1.5094         |
|        twins_pcpvt_base         | 64  | 0.9959 |  0.9042   |  1.9565  |         1.6296         |
|         coat_lite_mini          | 128 | 0.9973 |  0.9962   |  1.9422  |         1.9197         |
|          ghostnet_100           | 128 | 0.9921 |  0.7648   |  1.8482  |         1.5518         |
|          gmlp_s16_224           | 128 | 0.9942 |  1.0831   |   1.84   |         1.8306         |
|          gmixer_24_224          | 128 | 0.995  |  0.8889   |  1.755   |         1.7447         |
|            lcnet_050            | 128 | 0.9392 |  0.7362   |  1.6953  |         1.4191         |
|           volo_d1_224           | 64  | 0.9941 |   0.973   |  1.6865  |         1.6656         |
|         crossvit_9_240          | 128 | 0.9911 |  0.7832   |  1.6435  |         1.6151         |
|  swin_base_patch4_window7_224   | 64  | 0.9911 |  0.9414   |  1.6151  |         1.6062         |
|           convit_base           | 64  | 0.9982 |  0.9974   |  1.612   |         1.6119         |
|       gluon_inception_v3        | 128 | 0.9963 |  0.8647   |  1.5322  |         1.5188         |
|          inception_v3           | 128 | 0.9964 |  0.8642   |  1.5302  |         1.5179         |
|        adv_inception_v3         | 128 | 0.9963 |  0.8605   |  1.5296  |         1.5204         |
|             dla102              | 128 | 0.9958 |  0.8149   |  1.5268  |         1.5229         |
|        sebotnet33ts_256         | 64  | 0.9574 |  0.7652   |  1.5079  |         1.5337         |
|          convnext_base          | 64  | 0.9837 |  0.9848   |  1.4893  |         1.4714         |
|            nfnet_l0             | 128 | 0.9897 |  0.8136   |  1.4874  |         1.4334         |
|           dm_nfnet_f0           | 128 | 0.9864 |  0.9849   |  1.4783  |         1.4274         |
|       eca_botnext26ts_256       | 128 | 0.9733 |  0.7188   |   1.44   |         1.4243         |
|            pit_b_224            | 64  | 0.9944 |  0.9926   |  1.4346  |         1.4286         |
|           mnasnet_100           | 128 | 0.9481 |  0.7411   |  1.4311  |         1.4961         |
|           mobilevit_s           | 64  | 0.9618 |  0.7311   |  1.4296  |         1.4431         |
|      mobilenetv3_large_100      | 128 | 0.9482 |  0.7604   |  1.4272  |         1.387          |
|           resnest101e           | 64  | 0.994  |  0.8657   |  1.4204  |         1.3573         |
|           regnety_002           | 128 | 0.9503 |  0.7136   |  1.4118  |         1.1986         |
|           selecsls42b           | 128 | 0.9984 |  0.8121   |  1.4109  |         1.4098         |
|          botnet26t_256          | 128 | 0.9736 |  0.8514   |  1.4071  |         1.4232         |
|         mobilenetv2_100         | 128 | 0.9482 |  0.7373   |  1.3914  |         1.4427         |
|        res2net50_14w_8s         | 128 | 0.9989 |  0.7899   |  1.3819  |         1.3567         |
|           res2next50            | 128 | 0.9989 |  0.8252   |  1.3709  |         1.3621         |
|          jx_nest_base           | 32  | 0.9867 |  0.9839   |  1.367   |         1.3572         |
|          mixer_b16_224          | 128 | 0.9971 |  1.0181   |  1.3635  |         1.3591         |
|            hrnet_w18            | 128 | 0.9922 |  0.6433   |  1.3582  |         1.3454         |
|          spnasnet_100           | 128 | 0.9379 |  0.7386   |  1.3553  |         1.4163         |
|       tf_efficientnet_b0        | 128 | 0.9604 |  0.6813   |  1.3548  |         1.3826         |
|      beit_base_patch16_224      | 64  | 0.9962 |  0.9587   |  1.3518  |         1.3515         |
|           fbnetc_100            | 128 | 0.9487 |  0.7386   |  1.3502  |         1.4018         |
|          cait_m36_384           |  4  | 0.995  |  0.9928   |  1.3483  |         1.3467         |
|        ese_vovnet19b_dw         | 128 | 0.9575 |  0.8328   |  1.3473  |         1.3712         |
|         poolformer_m36          | 64  | 0.9868 |   0.983   |  1.3275  |         1.3185         |
|            fbnetv3_b            | 128 | 0.9489 |  0.7685   |  1.309   |         1.2972         |
|           rexnet_100            | 128 | 0.9517 |  0.7029   |  1.2963  |         1.3335         |
|          resmlp_12_224          | 128 | 0.993  |  0.8889   |  1.2592  |         1.2561         |
| deit_base_distilled_patch16_224 | 64  | 0.9963 |  0.9935   |  1.2553  |         1.2556         |
|      vit_base_patch16_224       | 64  | 0.9963 |  0.9937   |  1.2351  |         1.2354         |
|            tinynet_a            | 128 | 0.9471 |  0.6779   |  1.2314  |         1.2341         |
|          cspdarknet53           | 64  | 0.9333 |  0.7859   |  1.2249  |         1.2578         |
|           tf_mixnet_l           | 128 | 0.9764 |  0.8266   |  1.187   |         1.1908         |
|            mixnet_l             | 128 | 0.9763 |  0.8209   |  1.1767  |         1.1808         |
|         visformer_small         | 128 | 0.9959 |  0.9447   |  1.1741  |         1.166          |
|        res2net101_26w_4s        | 64  | 0.9989 |  0.7961   |  1.1532  |         1.0759         |
|          pnasnet5large          | 16  | 0.985  |  0.9132   |  1.1139  |         1.1263         |
|             dpn107              | 32  | 0.9318 |  0.8067   |  1.0925  |         1.1348         |
|            repvgg_a2            | 128 | 0.9342 |  0.7548   |  1.0829  |         1.1175         |
|        gluon_xception65         | 32  | 0.9922 |  0.8425   |  1.076   |         1.0799         |
|     swsl_resnext101_32x16d      | 32  | 0.9976 |  0.8406   |  1.0588  |         1.0248         |
|            gernet_l             | 128 | 0.9359 |  0.7932   |  1.0365  |         1.0663         |
|        convmixer_768_32         | 32  | 0.9986 |  0.9646   |  1.0015  |         1.0028         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.6563  |  11.2559  | 281.387  |        283.5166        |
|            hrnet_w18            | 128 | 9.6694  |  36.1277  | 251.4005 |        248.2693        |
|          ghostnet_100           | 128 | 7.5916  |  14.994   | 236.9784 |        235.9516        |
|            fbnetv3_b            | 128 | 8.5168  |  17.0125  | 171.6077 |        171.8795        |
|           resnest101e           | 64  | 11.1265 |  24.2129  | 169.9194 |        173.0331        |
|           mobilevit_s           | 64  | 5.4384  |  11.3804  | 168.8541 |        167.7505        |
|            tinynet_a            | 128 | 6.0246  |  12.2615  | 167.6446 |        165.2419        |
|          pnasnet5large          | 16  | 8.8085  |  26.2086  | 167.5939 |        166.725         |
|           tf_mixnet_l           | 128 | 9.1727  |  17.0443  | 166.758  |        159.7839        |
|            mixnet_l             | 128 | 8.4355  |  16.3055  | 165.8352 |        159.4478        |
|      mobilenetv3_large_100      | 128 | 4.3077  |  8.3851   | 158.8865 |        158.5923        |
|       tf_efficientnet_b0        | 128 | 5.1775  |  10.4702  | 158.6848 |        158.6533        |
|        adv_inception_v3         | 128 | 5.6983  |  12.5457  | 158.015  |        159.2732        |
|          inception_v3           | 128 | 5.7528  |  12.5655  | 157.8166 |        157.7184        |
|       gluon_inception_v3        | 128 | 5.7276  |  12.5132  | 157.3344 |        153.768         |
|        res2net101_26w_4s        | 64  | 10.7295 |  24.9241  | 153.5111 |        154.8713        |
|        twins_pcpvt_base         | 64  | 10.7455 |  23.5733  | 150.1201 |        151.6307        |
|          spnasnet_100           | 128 |  5.163  |  9.3402   | 139.5335 |        140.3958        |
|           fbnetc_100            | 128 | 4.9454  |  9.4342   | 139.1382 |        135.1571        |
|         mobilenetv2_100         | 128 | 4.0454  |  7.9484   | 138.708  |        136.0327        |
|      xcit_large_24_p8_224       |  5  | 12.788  |  28.1426  | 134.6173 |        134.8888        |
|        res2net50_14w_8s         | 128 | 9.0861  |  22.829   | 126.9463 |        126.6219        |
|           mnasnet_100           | 128 | 4.0434  |  7.6308   | 122.5232 |        125.4858        |
|          cait_m36_384           |  4  | 13.6624 |  30.6072  | 118.0691 |        115.5396        |
|        sebotnet33ts_256         | 64  | 4.2438  |  8.8369   | 111.7452 |        112.0161        |
|  swin_base_patch4_window7_224   | 64  |  8.641  |  19.2816  | 110.3836 |        111.0924        |
|           regnety_002           | 128 | 5.0051  |  8.7601   | 107.909  |        108.6622        |
|         poolformer_m36          | 64  | 7.6996  |  13.8523  | 103.7823 |        103.2956        |
|             dpn107              | 32  | 9.9871  |  19.6695  | 101.6174 |        97.2189         |
|            lcnet_050            | 128 | 2.5444  |  4.9778   | 100.5386 |        98.8013         |
|       eca_botnext26ts_256       | 128 | 3.0803  |  6.8371   | 100.0913 |        94.2482         |
|          cspdarknet53           | 64  | 5.7669  |  10.9138  | 100.0143 |        100.6711        |
|             dla102              | 128 | 6.3407  |  14.0075  | 99.5453  |        93.2567         |
|        gluon_xception65         | 32  | 7.8679  |  16.8295  | 96.8578  |        94.2104         |
|           selecsls42b           | 128 | 2.5553  |  5.3736   | 95.7115  |        94.9539         |
|           res2next50            | 128 |  5.052  |  12.0574  | 92.9892  |        90.5561         |
|          botnet26t_256          | 128 | 2.9369  |  5.9586   | 91.6848  |        89.7859         |
|         coat_lite_mini          | 128 | 3.3325  |  7.9168   | 89.3071  |        88.3046         |
|         crossvit_9_240          | 128 | 5.8383  |  13.428   | 87.6357  |        87.4503         |
|            gernet_l             | 128 | 4.9422  |  8.9356   | 83.9354  |        81.0972         |
|          jx_nest_base           | 32  | 6.6223  |  14.7553  | 83.6075  |        86.7964         |
|            nfnet_l0             | 128 | 5.3489  |  10.933   | 82.1904  |        79.0411         |
|           volo_d1_224           | 64  |  5.074  |  11.8761  | 75.8873  |        76.3569         |
|        ese_vovnet19b_dw         | 128 | 2.5619  |  4.5739   | 75.7279  |         75.941         |
|           dm_nfnet_f0           | 128 |  6.059  |  11.5608  | 71.0743  |        73.5385         |
|        tnt_s_patch16_224        | 128 | 6.6144  |  16.3861  | 70.6514  |        70.4926         |
|         visformer_small         | 128 | 2.6346  |  6.1011   | 69.5043  |        67.9874         |
|     swsl_resnext101_32x16d      | 32  |  6.335  |  13.6108  | 64.3777  |        63.6639         |
|            repvgg_a2            | 128 | 4.8544  |  8.7764   | 62.7066  |        63.6633         |
|          gmlp_s16_224           | 128 |  5.644  |  12.0054  | 61.4428  |        60.6145         |
|          convnext_base          | 64  | 6.8448  |  12.5458  | 58.8293  |        58.8814         |
|          gmixer_24_224          | 128 | 5.7032  |  12.8952  | 52.7832  |        52.7003         |
|           convit_base           | 64  | 3.4538  |   8.673   |  49.084  |        48.4884         |
|            pit_b_224            | 64  | 3.4832  |  8.0233   |  47.32   |        48.1922         |
| deit_base_distilled_patch16_224 | 64  | 3.1226  |  7.1758   | 42.9155  |        40.1081         |
|          resmlp_12_224          | 128 | 2.9021  |  5.2593   | 42.3009  |        43.2619         |
|      vit_base_patch16_224       | 64  | 3.0666  |  7.0203   | 40.8877  |        40.7696         |
|        convmixer_768_32         | 32  | 1.7055  |  6.8669   | 38.7419  |        37.1361         |
|      beit_base_patch16_224      | 64  | 3.9383  |  8.7547   | 36.8814  |        36.5291         |
|          mixer_b16_224          | 128 | 2.7155  |  5.8555   | 34.6202  |        34.1666         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9634 |  0.9151   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.3961 | 311.0117  | 300.0054 |        299.2143        |
|            hrnet_w18            | 128 | 280.6209 | 433.9854  | 205.9116 |        207.9796        |
|          pnasnet5large          | 16  | 199.6115 | 214.5989  |  175.66  |        174.2766        |
|           tf_mixnet_l           | 128 | 193.876  | 229.1304  | 159.5483 |        158.9581        |
|            mixnet_l             | 128 | 185.3385 | 220.6517  | 153.7488 |        153.3641        |
|          cait_m36_384           |  4  | 168.081  | 167.6357  | 123.7964 |        123.7673        |
|           resnest101e           | 64  | 165.2644 | 189.4825  | 114.8609 |        120.2893        |
|             dla102              | 128 | 172.5692 | 210.8151  | 112.6357 |        112.8744        |
|     swsl_resnext101_32x16d      | 32  | 118.9157 | 140.9388  | 111.777  |        115.4678        |
|         poolformer_m36          | 64  | 146.8638 | 147.3503  | 109.0748 |        109.7284        |
|        tnt_s_patch16_224        | 128 | 323.9986 | 324.2604  | 107.2676 |        108.6504        |
|          inception_v3           | 128 | 160.5329 | 185.2692  | 104.6618 |        105.5828        |
|        adv_inception_v3         | 128 | 160.7699 | 185.8045  | 104.6212 |        105.2404        |
|       gluon_inception_v3        | 128 | 161.0679 | 185.4522  | 104.5937 |        105.5917        |
|        res2net50_14w_8s         | 128 | 140.934  | 178.3712  | 101.6912 |        103.679         |
|           convit_base           | 64  | 163.0897 | 163.0759  | 100.9654 |        100.9551        |
|             dpn107              | 32  | 113.8603 | 131.5746  | 97.1882  |        93.4096         |
|        gluon_xception65         | 32  | 99.8148  | 117.1317  | 92.0582  |        91.5112         |
|           res2next50            | 128 | 125.9296 | 152.5758  |  91.762  |        92.4406         |
|  swin_base_patch4_window7_224   | 64  | 147.1767 | 155.3804  | 90.2652  |        90.8043         |
|           dm_nfnet_f0           | 128 | 128.6237 |  128.484  | 85.7493  |        88.6721         |
|          mixer_b16_224          | 128 | 116.6125 | 114.3046  | 85.3965  |        85.7992         |
|        res2net101_26w_4s        | 64  | 100.8864 | 125.8627  | 84.9789  |        92.2493         |
|            fbnetv3_b            | 128 | 115.3149 | 142.3716  | 83.5399  |        84.3153         |
|            pit_b_224            | 64  | 118.8206 | 118.9848  | 82.2141  |        82.5548         |
|          convnext_base          | 64  | 124.3508 |  124.089  | 82.1808  |          83.0          |
|         visformer_small         | 128 | 91.2023  |  96.2043  | 77.5002  |        77.8892         |
|            nfnet_l0             | 128 | 112.6311 | 137.4397  | 75.1492  |        77.8073         |
|      beit_base_patch16_224      | 64  | 101.7625 | 105.5401  |  74.948  |         74.825         |
|          gmlp_s16_224           | 128 | 137.6758 |  126.287  | 74.5603  |        74.6193         |
|       eca_botnext26ts_256       | 128 | 108.6592 |  147.275  | 73.5598  |        74.3193         |
|          jx_nest_base           | 32  | 101.5879 | 101.6875  |  73.237  |        73.7057         |
|          cspdarknet53           | 64  | 94.8818  | 112.6283  | 72.3236  |        70.4658         |
|           volo_d1_224           | 64  | 121.1105 |  123.622  | 71.2474  |        72.2751         |
|          botnet26t_256          | 128 | 101.7255 | 116.4502  |   70.5   |        69.6829         |
|            gernet_l             | 128 | 77.5811  |  91.7057  | 70.1594  |        68.1202         |
|      vit_base_patch16_224       | 64  | 86.8936  |  87.0291  | 70.1462  |        70.0605         |
| deit_base_distilled_patch16_224 | 64  |  84.917  |  85.2142  | 67.3665  |        67.3679         |
|          gmixer_24_224          | 128 | 118.0478 | 131.9167  | 67.1058  |        67.4437         |
|            repvgg_a2            | 128 | 77.7549  |  96.2273  |  67.091  |        65.0415         |
|      xcit_large_24_p8_224       |  5  | 129.1961 | 140.0771  | 62.4296  |        82.3074         |
|        twins_pcpvt_base         | 64  | 122.5759 | 131.9761  | 60.1232  |        71.4457         |
|       tf_efficientnet_b0        | 128 |  84.77   | 119.7873  | 60.0387  |        58.8849         |
|           rexnet_100            | 128 | 80.2154  |  108.524  | 58.7232  |         57.091         |
|           fbnetc_100            | 128 | 82.8668  | 106.4154  | 58.2814  |        56.0702         |
|         coat_lite_mini          | 128 | 113.0299 | 113.0276  | 57.9781  |        58.6525         |
|           mobilevit_s           | 64  |  84.581  | 111.3853  | 56.8405  |        56.3089         |
|            tinynet_a            | 128 | 73.7767  |  102.581  | 56.5911  |         56.363         |
|        sebotnet33ts_256         | 64  | 80.4061  | 100.5056  | 51.0787  |        50.2156         |
|         crossvit_9_240          | 128 | 82.2978  | 104.5855  | 49.7723  |        50.6192         |
|          spnasnet_100           | 128 |  70.705  |  89.7248  | 48.9232  |         46.84          |
|          ghostnet_100           | 128 | 90.6726  | 117.4854  | 48.5233  |        57.8737         |
|        ese_vovnet19b_dw         | 128 | 64.6321  |  74.2712  | 46.0158  |        45.1054         |
|         mobilenetv2_100         | 128 | 65.6133  |  84.4272  | 44.7128  |          43.1          |
|           mnasnet_100           | 128 | 64.3836  |  82.1722  | 42.5635  |        40.6986         |
|           selecsls42b           | 128 | 60.0645  |  73.8454  | 42.4287  |        42.5168         |
|          resmlp_12_224          | 128 | 53.5397  |  59.8404  | 42.1327  |        42.2518         |
|      mobilenetv3_large_100      | 128 | 61.4343  |  76.6338  |  40.788  |         41.931         |
|           regnety_002           | 128 | 40.9728  |  52.9696  | 26.4775  |         30.863         |
|            lcnet_050            | 128 | 31.7698  |  40.4659  |  17.565  |        21.0461         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

Build Summary

see more

Run name

day_093_03_04_23_performance_amp_684

Commit hashes

pytorch commit: 4431509
pytorch commit date: 2023-04-04 02:26:18+00:00
torchbench commit: 27ee569f62004bfd890c7ac69352daef49f4848f
torchbench commit date: 2023-04-03 10:48:29-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git4431509

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.57x    |    1.59x    |    1.41x    |
| inductor_no_cudagraphs |   1.27x    |    1.50x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.80    |    7.40     |    5.89     |
|       aot_eager        |    9.29    |    15.69    |    13.16    |
|        inductor        |   62.54    |    60.98    |   110.30    |
| inductor_no_cudagraphs |   62.57    |    58.07    |   109.44    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.79x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Previous report name: /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 85%, 51/60  | 85%, 51/60  |
|        inductor        | huggingface | 91%, 41/45  | 91%, 41/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.58x    |   1.57x   |
|        inductor        | huggingface |   1.60x    |   1.59x   |
|        inductor        | timm_models |   1.41x    |   1.41x   |
| inductor_no_cudagraphs | torchbench  |   1.25x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.49x    |   1.50x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+------------------------+-----------------+
|    suite    |             name              | inductor_no_cudagraphs |    inductor     |
+-------------+-------------------------------+------------------------+-----------------+
| torchbench  |         hf_Longformer         |      fail_to_run       |   fail_to_run   |
| torchbench  |             moco              |      fail_to_run       |   fail_to_run   |
| torchbench  |      Background_Matting       |    eager_variation     | eager_variation |
| torchbench  |        vision_maskrcnn        |    eager_variation     | eager_variation |
| torchbench  |           tacotron2           |         0.0000         |     0.0000      |
| torchbench  |              gat              |         0.0000         |     0.0000      |
| torchbench  |              gcn              |         0.0000         |     0.0000      |
| torchbench  |             llama             |         0.0000         |     0.0000      |
| torchbench  |             sage              |         0.0000         |     0.0000      |
| torchbench  |         torchrec_dlrm         |         0.0000         |     0.0000      |
| huggingface | DebertaV2ForQuestionAnswering |          pass          |   fail_to_run   |
| huggingface |  AlbertForQuestionAnswering   |     fail_accuracy      |  fail_accuracy  |
+-------------+-------------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+-------------------------------+------------------------+----------+
|    suite    |             name              | inductor_no_cudagraphs | inductor |
+-------------+-------------------------------+------------------------+----------+
| torchbench  |             dcgan             |          0.82          |  1.4116  |
| torchbench  |         lennard_jones         |         0.8755         |  1.4011  |
| torchbench  |       soft_actor_critic       |         0.8075         |  1.1977  |
| torchbench  |          tts_angular          |         0.9458         |  0.9491  |
| torchbench  |          timm_vovnet          |         0.9223         |  0.9402  |
| torchbench  |    nvidia_deeprecommender     |         1.0185         |  0.872   |
| torchbench  | timm_vision_transformer_large |         1.0817         |   0.0    |
| torchbench  |         hf_Longformer         |          0.0           |   0.0    |
| torchbench  |             moco              |          0.0           |   0.0    |
| torchbench  |              gat              |          0.0           |   0.0    |
| torchbench  |              gcn              |          0.0           |   0.0    |
| torchbench  |             sage              |          0.0           |   0.0    |
| torchbench  |           tacotron2           |          0.0           |   0.0    |
| torchbench  |         torchrec_dlrm         |          0.0           |   0.0    |
| huggingface |      DebertaForMaskedLM       |         0.8253         |  0.9543  |
| huggingface |     DebertaV2ForMaskedLM      |         0.6537         |  0.8836  |
| huggingface | DebertaV2ForQuestionAnswering |         0.6614         |  0.8318  |
| huggingface |     BlenderbotForCausalLM     |         1.2193         |   0.0    |
| huggingface |     AllenaiLongformerBase     |          0.0           |   0.0    |
+-------------+-------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |          hf_T5_large           |        174.5048        | 171.4964 |
| torchbench  |        phlippe_densenet        |        165.3112        | 167.4719 |
| torchbench  |           hf_BigBird           |        128.9371        | 147.8084 |
| torchbench  |       timm_efficientnet        |        141.9987        | 144.9385 |
| torchbench  |          densenet121           |        135.5308        | 137.3105 |
| torchbench  |       mobilenet_v3_large       |        136.1599        | 132.844  |
| torchbench  |          mobilenet_v2          |        126.3347        | 132.5027 |
| torchbench  | timm_vision_transformer_large  |        123.0006        |   nan    |
| huggingface |     MobileBertForMaskedLM      |        143.5512        | 143.4842 |
| huggingface | MobileBertForQuestionAnswering |        137.6431        | 139.4939 |
| huggingface |      DebertaV2ForMaskedLM      |        67.5552         | 137.4384 |
| huggingface | DebertaV2ForQuestionAnswering  |        65.7048         | 137.0963 |
| huggingface | M2M100ForConditionalGeneration |        133.7492        | 134.6321 |
| huggingface |  MT5ForConditionalGeneration   |         132.97         | 131.6577 |
| huggingface |        XGLMForCausalLM         |        130.8865        | 130.139  |
| timm_models |           rexnet_100           |        295.4693        | 293.7417 |
| timm_models |           hrnet_w18            |        245.6165        | 255.8613 |
| timm_models |          ghostnet_100          |        237.3688        | 235.6543 |
| timm_models |           fbnetv3_b            |        175.5957        | 174.5159 |
| timm_models |          resnest101e           |        166.8995        | 168.9761 |
| timm_models |         pnasnet5large          |        160.428         | 165.0935 |
| timm_models |     mobilenetv3_large_100      |        163.0381        | 164.9841 |
| timm_models |           tinynet_a            |        158.6807        | 162.414  |
| timm_models |          mobilevit_s           |        161.1337        | 162.3946 |
| timm_models |          tf_mixnet_l           |        155.4438        |  162.16  |
| timm_models |            mixnet_l            |        161.1297        | 160.7966 |
| timm_models |          inception_v3          |        160.2494        | 160.6346 |
| timm_models |        adv_inception_v3        |        157.5116        | 158.6132 |
| timm_models |       tf_efficientnet_b0       |        149.1241        | 158.5465 |
| timm_models |       gluon_inception_v3       |        160.2057        | 156.6087 |
| timm_models |       res2net101_26w_4s        |        150.5969        | 153.7283 |
| timm_models |        twins_pcpvt_base        |        147.6075        | 149.3033 |
| timm_models |           fbnetc_100           |        133.0222        | 140.3762 |
| timm_models |          spnasnet_100          |        137.3888        | 134.495  |
| timm_models |      xcit_large_24_p8_224      |        131.6058        | 131.9644 |
| timm_models |        mobilenetv2_100         |        134.5086        | 130.2853 |
| timm_models |          mnasnet_100           |        120.0858        | 124.9001 |
| timm_models |        res2net50_14w_8s        |        123.7988        | 122.4246 |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |              hf_GPT2_large              |         1.128          |  0.8904  |
| torchbench  |                 yolov3                  |         1.0117         |   0.87   |
| torchbench  |           speech_transformer            |         0.869          |  0.8651  |
| torchbench  |              timm_resnest               |         0.9657         |  0.8629  |
| torchbench  |           shufflenet_v2_x1_0            |         0.958          |  0.8621  |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8593  |
| torchbench  |                resnet152                |         0.9418         |  0.8504  |
| torchbench  |           Background_Matting            |         1.0406         |  0.8485  |
| torchbench  |               timm_regnet               |         0.9536         |  0.8482  |
| torchbench  |              hf_DistilBert              |         0.9945         |  0.8476  |
| torchbench  |               hf_T5_large               |         1.168          |  0.8201  |
| torchbench  |              pytorch_unet               |         0.9308         |  0.8134  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8058  |
| torchbench  |                 hf_Bart                 |         0.9173         |  0.7933  |
| torchbench  |                  dcgan                  |         0.9645         |  0.7821  |
| torchbench  |                resnet50                 |         0.8833         |  0.7817  |
| torchbench  |                 demucs                  |         0.9656         |  0.773   |
| torchbench  |              squeezenet1_1              |         0.9087         |  0.7722  |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.7715  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.7529  |
| torchbench  |             pytorch_struct              |         0.7362         |  0.7277  |
| torchbench  |                  vgg16                  |         0.9808         |  0.7227  |
| torchbench  |               mnasnet1_0                |         0.7781         |  0.7159  |
| torchbench  |               densenet121               |         0.7998         |  0.7097  |
| torchbench  |                 alexnet                 |         0.939          |  0.7091  |
| torchbench  |           mobilenet_v3_large            |         0.7752         |  0.6979  |
| torchbench  |               hf_BigBird                |         1.1191         |  0.6968  |
| torchbench  |             resnext50_32x4d             |         0.7709         |  0.668   |
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.6585  |
| torchbench  |                   drq                   |         0.9573         |  0.6379  |
| torchbench  |            soft_actor_critic            |         0.9973         |  0.6066  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.5925  |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6172         |  0.5904  |
| torchbench  |                resnet18                 |         0.6097         |  0.5395  |
| torchbench  |              lennard_jones              |         0.9997         |  0.5317  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.4538  |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.3991  |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3169  |
| huggingface |           PegasusForCausalLM            |         0.9864         |  0.893   |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8849  |
| huggingface |            TrOCRForCausalLM             |         0.9583         |  0.8836  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8729  |
| huggingface |     PegasusForConditionalGeneration     |         1.0689         |  0.8689  |
| huggingface |      MBartForConditionalGeneration      |         1.0307         |  0.8672  |
| huggingface |      BartForConditionalGeneration       |         1.0139         |  0.8456  |
| huggingface |         MegatronBertForCausalLM         |         1.0962         |  0.845   |
| huggingface |       BlenderbotSmallForCausalLM        |         0.9119         |  0.8184  |
| huggingface |         Speech2Text2ForCausalLM         |         0.8779         |  0.789   |
| huggingface |     M2M100ForConditionalGeneration      |         0.9908         |  0.7651  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.7473  |
| huggingface |             XGLMForCausalLM             |         0.9792         |  0.7117  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6569  |
| huggingface |           DebertaForMaskedLM            |         0.9978         |  0.5501  |
| huggingface |          DebertaV2ForMaskedLM           |         0.9665         |  0.5197  |
| huggingface |      DebertaV2ForQuestionAnswering      |         0.9797         |  0.487   |
| huggingface |       DebertaForQuestionAnswering       |         1.1527         |  0.4601  |
| timm_models |                hrnet_w18                |          0.99          |  0.8918  |
| timm_models |            sebotnet33ts_256             |         1.1115         |  0.891   |
| timm_models |              inception_v3               |         1.0171         |  0.8904  |
| timm_models |           gluon_inception_v3            |         1.0171         |  0.8904  |
| timm_models |            adv_inception_v3             |         1.0171         |  0.8904  |
| timm_models |                 dpn107                  |         0.9642         |  0.8833  |
| timm_models |            gluon_xception65             |         0.9705         |  0.8831  |
| timm_models |              ghostnet_100               |         0.977          |  0.8807  |
| timm_models |              spnasnet_100               |         0.9451         |  0.8786  |
| timm_models |          mobilenetv3_large_100          |         0.9361         |  0.877   |
| timm_models |             poolformer_m36              |         1.1871         |  0.8768  |
| timm_models |           eca_botnext26ts_256           |         1.0072         |  0.8738  |
| timm_models |          xcit_large_24_p8_224           |         0.9732         |  0.8721  |
| timm_models |            res2net50_14w_8s             |         0.9607         |  0.8712  |
| timm_models |            res2net101_26w_4s            |         0.9483         |  0.871   |
| timm_models |                mixnet_l                 |         0.9902         |  0.8687  |
| timm_models |               mnasnet_100               |         0.9403         |  0.8683  |
| timm_models |               res2next50                |         0.9547         |  0.866   |
| timm_models |              cait_m36_384               |         0.989          |  0.8632  |
| timm_models |               fbnetc_100                |         0.9535         |  0.8596  |
| timm_models |                pit_b_224                |         1.0242         |  0.8578  |
| timm_models |               selecsls42b               |         0.9664         |  0.8576  |
| timm_models |              convnext_base              |         1.0338         |  0.8505  |
| timm_models |                gernet_l                 |         0.9706         |  0.8499  |
| timm_models |         swsl_resnext101_32x16d          |         0.9786         |  0.8461  |
| timm_models |             coat_lite_mini              |         1.0202         |  0.8402  |
| timm_models |              botnet26t_256              |         0.9779         |  0.8239  |
| timm_models |                lcnet_050                |         0.884          |  0.805   |
| timm_models |                repvgg_a2                |         0.9611         |  0.7738  |
| timm_models |               regnety_002               |         0.8966         |  0.7602  |
| timm_models |             crossvit_9_240              |         0.9898         |  0.7526  |
| timm_models |      swin_base_patch4_window7_224       |         0.9045         |  0.7214  |
| timm_models |              jx_nest_base               |         0.9604         |  0.6693  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/passrate_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/memory_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Performance speedup regressions

+------------------------+-------------+-------------+------------+
|        compiler        |    name     | prev_status | cur_status |
+------------------------+-------------+-------------+------------+
| inductor_no_cudagraphs | tts_angular |   0.9507    |   0.9458   |
+------------------------+-------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9618 |  0.9106   |  3.6439  |         1.3477         |
|           BERT_pytorch            |  16  | 0.9906 |  0.8008   |  3.0121  |         2.079          |
|            densenet121            |  4   | 0.9855 |  0.7135   |  2.696   |         1.0482         |
|            hf_BigBird             |  2   | 0.9529 |  0.7817   |  2.6658  |         1.6791         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9641 |  0.8975   |  2.4297  |         1.7785         |
|             hf_Albert             |  8   | 0.9956 |   0.956   |  2.3728  |         2.3195         |
|            hf_T5_large            |  2   | 0.9729 |  0.8061   |  2.1768  |         1.8587         |
|         phlippe_densenet          | 128  | 0.9829 |  0.7712   |  2.0576  |         1.0099         |
|        mobilenet_v3_large         |  32  | 0.9985 |  0.7831   |  2.0453  |         1.1793         |
|           squeezenet1_1           |  32  | 0.9808 |  0.9047   |  1.9186  |         1.2438         |
|               hf_T5               |  8   | 0.9841 |   0.849   |  1.9145  |         1.9818         |
|               dlrm                | 1024 | 0.935  |  0.8489   |  1.8924  |         1.1912         |
|          phlippe_resnet           | 128  | 0.9826 |  0.7607   |  1.8222  |         0.9765         |
|              hf_Bert              |  4   | 0.9958 |  0.8393   |  1.791   |         1.5809         |
|              hf_GPT2              |  4   | 0.9955 |  0.9623   |  1.7662  |         1.802          |
|          resnext50_32x4d          |  8   | 0.9817 |  0.7153   |  1.7198  |         0.9767         |
|           hf_GPT2_large           |  4   | 0.9826 |  0.9716   |  1.6762  |         1.7359         |
|              hf_Bart              |  4   | 0.9784 |  0.8347   |  1.6728  |         1.5731         |
|            mnasnet1_0             |  32  | 0.9875 |  0.7281   |  1.6519  |         1.0671         |
|        shufflenet_v2_x1_0         | 128  | 0.9923 |  0.7517   |  1.6214  |         1.1979         |
|           hf_Bert_large           |  4   | 0.9954 |  0.8613   |  1.598   |         1.5486         |
|        speech_transformer         |  32  | 0.9772 |  0.8228   |  1.5949  |         1.6484         |
|           timm_resnest            |  32  | 0.9925 |  0.8508   |  1.5727  |         1.5093         |
|             resnet18              |  16  | 0.9848 |  0.7624   |  1.5648  |         0.9685         |
|                drq                |  1   | 0.9614 |  0.7402   |  1.5557  |         0.9824         |
|           fastNLP_Bert            |  6   | 0.9921 |  0.8524   |  1.5407  |         1.498          |
|      timm_vision_transformer      |  32  | 0.9791 |  0.8549   |  1.5375  |         1.3708         |
|            timm_nfnet             | 128  | 0.9854 |  0.9841   |  1.5343  |         1.4726         |
|           mobilenet_v2            |  96  | 0.9967 |  0.7782   |  1.5268  |         1.4864         |
|          pytorch_struct           | 200  | 0.9137 |  0.7691   |  1.5163  |         1.1063         |
| attention_is_all_you_need_pytorch | 256  | 0.9885 |  0.9064   |  1.4959  |         1.4964         |
|           hf_DistilBert           |  8   | 0.9797 |  0.9547   |  1.459   |         1.4756         |
|         timm_efficientnet         |  32  | 0.9388 |  0.6206   |  1.4215  |         1.0839         |
|               dcgan               |  32  | 0.8543 |  0.6801   |  1.4116  |          0.82          |
|           lennard_jones           | 1000 | 0.8291 |  0.7362   |  1.4011  |         0.8755         |
|           pytorch_unet            |  1   | 0.9964 |  0.2048   |  1.3588  |         1.3529         |
|          LearningToPaint          |  96  | 0.986  |  0.7745   |  1.3036  |         1.0526         |
|          pytorch_stargan          |  16  | 0.9943 |  0.7909   |  1.2478  |         1.2652         |
|               vgg16               |  64  | 0.9994 |  0.9986   |  1.2404  |         1.2533         |
|            Super_SloMo            |  6   | 0.997  |  0.1793   |  1.233   |         1.2331         |
|        Background_Matting         |  4   | 0.9992 |  0.1369   |  1.2125  |         1.2082         |
|         soft_actor_critic         | 256  | 0.8622 |  0.6414   |  1.1977  |         0.8075         |
|              yolov3               |  16  | 0.9961 |   0.807   |  1.1971  |         1.1994         |
|             resnet152             |  32  | 0.9957 |  0.7609   |  1.1757  |         1.0073         |
|             resnet50              |  32  | 0.9953 |  0.7724   |  1.1748  |         1.051          |
|            hf_Reformer            |  4   | 0.9861 |  0.9697   |  1.1414  |         1.0644         |
|              alexnet              | 128  | 0.999  |  0.9967   |  1.0886  |         1.1359         |
|              demucs               |  4   | 0.9995 |  0.9979   |  1.0382  |         1.0394         |
|            timm_regnet            |  32  | 0.9169 |  0.7756   |  1.0173  |         0.9683         |
|            tts_angular            |  64  | 0.9045 |  0.8968   |  0.9491  |         0.9458         |
|            timm_vovnet            |  32  | 0.8528 |  0.7108   |  0.9402  |         0.9223         |
|      nvidia_deeprecommender       | 256  | 0.9987 |  0.9989   |  0.872   |         1.0185         |
|   timm_vision_transformer_large   |  32  | 0.9981 |    0.0    |   0.0    |         1.0817         |
|           hf_Longformer           |  2   | 1.0111 |  0.6869   |   0.0    |          0.0           |
|               moco                |  32  | 0.9379 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 26.742  |  54.8929  | 171.4964 |        174.5048        |
|         phlippe_densenet          | 128  | 3.2261  |  6.9928   | 167.4719 |        165.3112        |
|            hf_BigBird             |  2   | 12.9793 |  37.3002  | 147.8084 |        128.9371        |
|         timm_efficientnet         |  32  | 5.0485  |  10.2446  | 144.9385 |        141.9987        |
|            densenet121            |  4   | 7.7061  |  18.156   | 137.3105 |        135.5308        |
|        mobilenet_v3_large         |  32  | 3.6096  |  7.7513   | 132.844  |        136.1599        |
|           mobilenet_v2            |  96  | 3.1548  |  7.0477   | 132.5027 |        126.3347        |
|              yolov3               |  16  | 4.9578  |  10.5215  | 118.6659 |        115.6353        |
|             resnet152             |  32  | 9.1901  |  20.4112  | 108.2497 |        106.9112        |
|            mnasnet1_0             |  32  | 3.1252  |  6.7622   | 106.837  |        106.4237        |
|           hf_GPT2_large           |  4   | 14.8314 |  30.0322  | 104.7304 |        103.5223        |
|           timm_resnest            |  32  | 1.8192  |  3.8929   | 99.8452  |        99.5756         |
|        shufflenet_v2_x1_0         | 128  | 3.4095  |   7.752   | 81.1621  |        80.9262         |
|        speech_transformer         |  32  | 6.0647  |  13.8358  | 77.7708  |        78.3783         |
| attention_is_all_you_need_pytorch | 256  | 4.5073  |  11.1585  | 75.0369  |         74.094         |
|            timm_regnet            |  32  |  6.729  |  12.5736  |  74.383  |        70.6323         |
|            timm_nfnet             | 128  | 5.8248  |  11.3662  |  72.468  |        71.6252         |
|        Background_Matting         |  4   | 3.1074  |  11.4683  | 70.2641  |        67.8794         |
|           BERT_pytorch            |  16  | 4.9515  |  11.6905  | 68.8479  |        68.8434         |
|             resnet50              |  32  | 3.2442  |  7.0179   | 66.5809  |        63.9128         |
|           hf_Bert_large           |  4   | 10.2373 |  21.258   | 63.6873  |        63.2815         |
|            timm_vovnet            |  32  | 3.6562  |  6.3069   | 63.0078  |        60.6532         |
|           pytorch_unet            |  1   | 1.5341  |  4.3991   | 60.9805  |        58.9818         |
|       functorch_dp_cifar10        |  64  |  1.229  |  2.4035   | 56.6185  |        55.8661         |
|          resnext50_32x4d          |  8   |  3.249  |  7.0494   | 53.5129  |        53.3238         |
|           fastNLP_Bert            |  6   | 5.2612  |  11.254   | 51.3168  |        47.3676         |
|               hf_T5               |  8   | 5.6727  |  12.667   | 50.6967  |        49.8267         |
|      timm_vision_transformer      |  32  | 3.3865  |  7.3453   | 50.5515  |        49.1561         |
|              hf_Bart              |  4   | 6.1921  |  13.8774  |  48.445  |        48.7264         |
|          pytorch_stargan          |  16  | 1.2511  |  3.2884   | 46.4538  |        46.5647         |
|          LearningToPaint          |  96  | 1.4222  |  2.9323   | 45.2275  |        42.7787         |
|            hf_Reformer            |  4   | 4.1301  |  5.9668   |  44.361  |        39.9852         |
|             resnet18              |  16  |  1.361  |  2.8969   | 43.2759  |        43.7299         |
|            Super_SloMo            |  6   |  2.777  |  9.7394   | 42.7967  |        43.4236         |
|              hf_GPT2              |  4   | 4.7547  |  9.9141   | 41.1871  |        42.4074         |
|             hf_Albert             |  8   | 2.4892  |  7.9945   | 40.5828  |        40.5569         |
|              hf_Bert              |  4   | 5.0535  |  10.5746  | 38.7711  |        38.0345         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.253  |  2.9583   |  35.626  |        35.8858         |
|          phlippe_resnet           | 128  | 1.3562  |  2.8244   | 33.0319  |        31.3071         |
|              demucs               |  4   |  1.441  |   2.202   | 31.0454  |        30.5738         |
|           hf_DistilBert           |  8   | 2.3959  |  5.3102   | 30.3094  |        30.8921         |
|           squeezenet1_1           |  32  | 1.0616  |  1.8155   | 22.7019  |        23.7686         |
|          pytorch_struct           | 200  |  0.735  |  1.3262   | 20.9718  |        20.1036         |
|              alexnet              | 128  | 0.4951  |  0.7724   | 15.7936  |        15.1494         |
|               vgg16               |  64  | 0.6218  |  1.1251   |  15.706  |        15.7621         |
|                drq                |  1   | 0.6653  |   1.012   | 10.2421  |        10.7709         |
|      nvidia_deeprecommender       | 256  | 0.4835  |  0.7492   |  9.6468  |         9.3258         |
|               dcgan               |  32  | 0.4345  |  0.7104   |  7.6559  |         7.6018         |
|               dlrm                | 1024 | 0.3805  |  0.7847   |  7.5027  |         7.1818         |
|         soft_actor_critic         | 256  | 0.4269  |  0.6181   |  7.1593  |         6.633          |
|           lennard_jones           | 1000 | 0.3925  |  0.6004   |  6.0736  |         5.8716         |
|            tts_angular            |  64  | 0.4466  |  0.5142   |  5.8264  |         5.7617         |
|   timm_vision_transformer_large   |  32  | 9.4038  |    nan    |   nan    |        123.0006        |
|           hf_Longformer           |  2   | 9.5181  |  30.6289  |   nan    |          nan           |
|               moco                |  32  | 32.8417 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.2082  |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2557         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9862 |  0.7653   |  1.0097  |         1.1013         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9881  |         0.9983         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|            timm_nfnet             | 128  | 0.9068 |  0.8749   |  0.9679  |         1.0727         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.963  |  0.8353   |  0.9422  |         1.026          |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9321  |         1.0713         |
|         timm_efficientnet         |  32  | 0.9842 |  0.7668   |  0.9284  |         1.006          |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8904  |         1.128          |
|              yolov3               |  16  | 0.9877 |   0.846   |   0.87   |         1.0117         |
|        speech_transformer         |  32  | 0.9915 |    0.9    |  0.8651  |         0.869          |
|           timm_resnest            |  32  | 0.9887 |  0.8967   |  0.8629  |         0.9657         |
|        shufflenet_v2_x1_0         | 128  | 0.9539 |  0.8396   |  0.8621  |         0.958          |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|             resnet152             |  32  | 0.9959 |  0.8945   |  0.8504  |         0.9418         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  0.8485  |         1.0406         |
|            timm_regnet            |  32  | 0.9903 |  0.8533   |  0.8482  |         0.9536         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9945         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.7933  |         0.9173         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9926 |  0.8629   |  0.7817  |         0.8833         |
|              demucs               |  4   | 0.9661 |  0.9659   |  0.773   |         0.9656         |
|           squeezenet1_1           |  32  | 0.9695 |  0.9291   |  0.7722  |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7277  |         0.7362         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|            mnasnet1_0             |  32  | 0.9783 |  0.8984   |  0.7159  |         0.7781         |
|            densenet121            |  4   | 0.9969 |  0.9783   |  0.7097  |         0.7998         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.7091  |         0.939          |
|        mobilenet_v3_large         |  32  | 0.9765 |  0.8752   |  0.6979  |         0.7752         |
|            hf_BigBird             |  2   | 0.9495 |  0.9264   |  0.6968  |         1.1191         |
|          resnext50_32x4d          |  8   | 0.9962 |  0.8441   |  0.668   |         0.7709         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9202 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8796   |  0.5904  |         0.6172         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.5395  |         0.6097         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9509 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 1.004  |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.675  | 214.7876  | 124.4601 |        120.4532        |
|        Background_Matting         |  4   | 125.9891 | 919.2718  | 103.8188 |        103.9817        |
|            hf_T5_large            |  2   | 229.566  | 269.3063  | 101.407  |        121.4925        |
|               hf_T5               |  8   | 181.6735 | 210.9392  |  93.827  |        90.6301         |
|            timm_nfnet             | 128  | 119.703  |  120.299  | 77.0605  |         79.974         |
|            hf_BigBird             |  2   | 206.5392 | 249.2379  | 74.0831  |        117.1258        |
|            hf_Reformer            |  4   | 82.1532  |  83.3816  | 70.9688  |        75.9542         |
|            Super_SloMo            |  6   | 79.6708  | 442.9437  | 64.4288  |        64.4746         |
|              yolov3               |  16  |  68.829  |  84.6703  | 57.2741  |        57.0511         |
|            timm_regnet            |  32  | 60.6697  |  71.6562  | 54.9197  |         57.877         |
|               vgg16               |  64  |  66.206  |  66.2797  | 53.4113  |        52.8014         |
|             resnet152             |  32  | 64.3901  |  83.8955  |  52.536  |        62.5597         |
|           hf_Bert_large           |  4   | 82.6573  |  94.9505  | 51.9287  |        53.1976         |
|              demucs               |  4   | 53.7056  |  53.5539  | 51.6595  |         51.428         |
| attention_is_all_you_need_pytorch | 256  | 58.0781  |  60.0264  | 36.1205  |        36.4376         |
|        speech_transformer         |  32  | 73.8772  |  87.0858  | 35.3238  |        43.9697         |
|              hf_Bart              |  4   | 59.4906  |  69.2746  | 34.8484  |        36.9456         |
|           fastNLP_Bert            |  6   | 52.9584  |  61.2922  | 34.4923  |        34.6372         |
|           mobilenet_v2            |  96  | 47.1429  |  60.2834  |  30.747  |        31.6292         |
|           pytorch_unet            |  1   | 39.9641  | 194.0227  | 29.2567  |         29.397         |
|             hf_Albert             |  8   | 68.6192  |  71.468   | 29.2296  |         29.811         |
|              hf_GPT2              |  4   | 49.1964  |   50.64   | 27.1702  |         27.001         |
|            timm_vovnet            |  32  | 28.8679  |  34.3788  | 26.1764  |        26.6258         |
|              hf_Bert              |  4   | 40.4426  |  48.842   | 22.5099  |        25.9893         |
|         timm_efficientnet         |  32  | 33.8666  |  51.3473  | 22.2231  |        29.3837         |
|             resnet50              |  32  | 26.5343  |  33.9185  | 22.0707  |        24.9118         |
|           hf_DistilBert           |  8   |  32.047  |  32.896   |  21.457  |        21.2996         |
|            densenet121            |  4   | 54.8899  |  75.2997  | 19.7784  |        52.0529         |
|        shufflenet_v2_x1_0         | 128  | 30.3273  |  40.5012  | 18.6295  |        25.6837         |
|      timm_vision_transformer      |  32  | 29.7284  |  33.4414  |  18.264  |        20.2358         |
|           BERT_pytorch            |  16  | 55.2725  |   67.48   | 17.7825  |        25.9151         |
|           timm_resnest            |  32  | 24.3231  |  28.2015  | 15.2773  |         15.893         |
|            mnasnet1_0             |  32  | 22.3902  |  29.9415  | 14.1557  |        20.8142         |
|        mobilenet_v3_large         |  32  | 26.7727  |  34.0162  | 13.0153  |        22.5204         |
|          resnext50_32x4d          |  8   | 20.7659  |  28.172   | 11.9256  |        20.8413         |
|      nvidia_deeprecommender       | 256  | 10.2135  |  10.2251  | 11.6929  |        10.0298         |
|          pytorch_stargan          |  16  |  14.999  |  19.1352  | 11.5788  |        12.0525         |
|         phlippe_densenet          | 128  |  23.06   |  29.4417  |  11.523  |        23.4172         |
|              alexnet              | 128  |  9.8328  |  9.8378   |  9.0115  |         8.6487         |
|          LearningToPaint          |  96  | 11.4575  |  14.5784  |  8.599   |        10.6516         |
|            tts_angular            |  64  |  7.1668  |  6.9528   |   6.57   |         6.6394         |
|             resnet18              |  16  |  9.2957  |  12.1999  |  5.7568  |         9.5426         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.7004  |  15.5362  |  5.7412  |         7.8365         |
|           squeezenet1_1           |  32  | 10.6615  |  12.9257  |  5.4103  |         9.6332         |
|          phlippe_resnet           | 128  |  9.0565  |  11.6353  |  4.9113  |        10.2123         |
|          pytorch_struct           | 200  |  5.0123  |  6.0192   |  3.0795  |         4.1699         |
|       functorch_dp_cifar10        |  64  | 12.0374  |  10.9163  |  2.8443  |         7.4369         |
|               dlrm                | 1024 |  4.401   |  4.8338   |  2.1412  |         3.5295         |
|                drq                |  1   |  3.4064  |  4.4969   |  2.1339  |         4.1595         |
|               dcgan               |  32  |  2.4004  |  3.0571   |  1.5282  |         2.6166         |
|         soft_actor_critic         | 256  |  1.9572  |  2.4533   |  1.3428  |         1.8994         |
|           lennard_jones           | 1000 |  1.7869  |  2.1685   |  1.1503  |         1.7693         |
|   timm_vision_transformer_large   |  32  | 465.1917 |    nan    |   nan    |        428.4144        |
|           hf_Longformer           |  2   | 112.9509 | 165.5347  |   nan    |          nan           |
|               moco                |  32  | 53.3308  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9904 |  0.9292   |  2.4614  |         2.4827         |
|          MobileBertForMaskedLM          | 64  | 0.9443 |  0.7986   |  2.341   |         1.0762         |
|      GPT2ForSequenceClassification      |  4  | 0.9761 |  0.9436   |  2.2547  |         2.2843         |
|     MobileBertForQuestionAnswering      | 128 | 0.9503 |  0.7967   |  2.1547  |         1.0633         |
|       ElectraForQuestionAnswering       | 64  | 0.9883 |  0.9816   |  2.119   |         2.1111         |
|       MT5ForConditionalGeneration       | 16  | 0.9904 |  0.8347   |  2.0619  |         1.9086         |
|             XGLMForCausalLM             |  8  | 0.9872 |  0.8349   |  1.9168  |         1.4415         |
|           ElectraForCausalLM            | 32  | 0.9821 |  0.9418   |  1.8465  |         1.8197         |
|            XLNetLMHeadModel             |  8  | 0.995  |  0.9665   |  1.817   |         1.8148         |
|    LayoutLMForSequenceClassification    | 16  | 0.9843 |  0.9704   |  1.7904  |         1.772          |
|       RobertaForQuestionAnswering       | 16  | 0.984  |  0.9693   |  1.7862  |         1.7683         |
|        BertForQuestionAnswering         | 16  | 0.9842 |  0.9695   |  1.7777  |         1.7623         |
|           RobertaForCausalLM            | 16  | 0.9868 |  0.9632   |  1.6681  |         1.6665         |
|               DistillGPT2               | 16  | 0.9869 |  0.9544   |  1.6578  |         1.7015         |
|            PLBartForCausalLM            |  8  | 0.9814 |  0.9593   |  1.6514  |         1.6702         |
|       AlbertForQuestionAnswering        |  4  | 0.9999 |  0.8855   |  1.6477  |         1.6385         |
|            AlbertForMaskedLM            |  4  | 0.9997 |  0.8848   |  1.6359  |         1.6289         |
|       T5ForConditionalGeneration        |  4  | 0.9783 |  0.8496   |  1.6301  |         1.7193         |
|                 T5Small                 |  4  | 0.9773 |  0.8451   |  1.6281  |         1.7182         |
|     PLBartForConditionalGeneration      |  4  | 0.9826 |  0.9428   |  1.6194  |         1.6534         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9805 |  0.9608   |  1.6053  |         1.6292         |
|             BertForMaskedLM             | 16  | 0.9858 |  0.9613   |  1.5932  |         1.5835         |
|           LayoutLMForMaskedLM           | 16  | 0.9847 |  0.9623   |  1.5639  |         1.5969         |
|     M2M100ForConditionalGeneration      | 16  | 0.986  |  0.8399   |  1.5495  |         1.3958         |
|         Speech2Text2ForCausalLM         | 256 | 0.9741 |   0.922   |  1.547   |         1.5631         |
|                CamemBert                | 16  | 0.9867 |   0.963   |  1.5461  |         1.5444         |
|             BartForCausalLM             |  4  | 0.9777 |  0.9557   |  1.5167  |         1.5454         |
|            MBartForCausalLM             |  4  | 0.9832 |  0.9537   |  1.5121  |         1.5422         |
|            YituTechConvBert             | 16  | 0.9858 |  0.9567   |  1.5085  |         1.4919         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9982 |  0.9127   |  1.4885  |         1.4077         |
|         MegatronBertForCausalLM         |  4  | 0.9876 |  0.9167   |  1.4656  |         1.5007         |
|      BartForConditionalGeneration       |  2  | 0.9975 |  0.9676   |  1.4547  |         1.4762         |
|      MBartForConditionalGeneration      |  2  | 0.995  |  0.9684   |  1.4453  |         1.5162         |
|     DistilBertForQuestionAnswering      | 256 | 0.9944 |  0.9871   |  1.4452  |         1.4456         |
|     PegasusForConditionalGeneration     | 32  | 0.9994 |  0.9393   |  1.3318  |         1.2981         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9597 |  0.9107   |  1.2963  |         1.2591         |
|            TrOCRForCausalLM             | 32  | 0.9873 |  0.9531   |  1.2557  |         1.2847         |
|          DistilBertForMaskedLM          | 128 | 0.9916 |  0.9491   |  1.2157  |         1.2333         |
|           PegasusForCausalLM            | 32  | 0.9756 |  0.9225   |  1.1966  |         1.2717         |
|       DebertaForQuestionAnswering       |  8  | 0.7999 |  0.7012   |  1.0712  |         0.9549         |
|           DebertaForMaskedLM            |  4  | 0.7151 |  0.5581   |  0.9543  |         0.8253         |
|          DebertaV2ForMaskedLM           |  1  | 0.6859 |  0.5219   |  0.8836  |         0.6537         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7061 |   0.526   |  0.8318  |         0.6614         |
|          BlenderbotForCausalLM          |  4  | 0.9739 |  0.8314   |   0.0    |         1.2193         |
|          AllenaiLongformerBase          |  4  | 1.0062 |  0.6685   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 17.1179 |  40.1385  | 143.4842 |        143.5512        |
|     MobileBertForQuestionAnswering      | 128 | 17.0998 |  39.9419  | 139.4939 |        137.6431        |
|          DebertaV2ForMaskedLM           |  1  | 15.5186 |  26.9328  | 137.4384 |        67.5552         |
|      DebertaV2ForQuestionAnswering      |  2  | 16.1028 |  26.8239  | 137.0963 |        65.7048         |
|     M2M100ForConditionalGeneration      | 16  | 12.429  |  26.0697  | 134.6321 |        133.7492        |
|       MT5ForConditionalGeneration       | 16  | 8.5435  |  18.3328  | 131.6577 |         132.97         |
|             XGLMForCausalLM             |  8  | 9.5006  |  21.0558  | 130.139  |        130.8865        |
|            XLNetLMHeadModel             |  8  | 10.3654 |  27.4498  |  90.951  |        91.0578         |
|       DebertaForQuestionAnswering       |  8  | 7.3998  |  13.6685  |  81.568  |        53.5005         |
|           DebertaForMaskedLM            |  4  | 7.3043  |  13.8673  | 79.9691  |        52.9596         |
|      MBartForConditionalGeneration      |  2  | 12.5952 |  25.8783  | 79.1677  |         77.956         |
|      BartForConditionalGeneration       |  2  | 11.8029 |  25.7499  | 75.4767  |        73.5354         |
|     PegasusForConditionalGeneration     | 32  | 5.4014  |  19.2962  | 66.7336  |        67.1192         |
|            YituTechConvBert             | 16  | 7.5669  |  15.8503  | 66.4734  |        65.6725         |
|         MegatronBertForCausalLM         |  4  | 10.5721 |  21.5537  |  66.087  |        65.7132         |
|    MegatronBertForQuestionAnswering     |  8  | 10.3528 |  21.0724  | 64.8506  |        66.4355         |
| BlenderbotSmallForConditionalGeneration | 64  | 8.2774  |  16.9598  | 53.8227  |        54.5901         |
|           ElectraForCausalLM            | 32  | 5.6009  |  10.9509  | 52.5432  |         51.992         |
|                 T5Small                 |  4  | 5.8537  |  12.8591  | 49.2183  |        48.5994         |
|       T5ForConditionalGeneration        |  4  | 5.9035  |  12.9498  | 49.1765  |        49.1561         |
|     PLBartForConditionalGeneration      |  4  | 6.2937  |  13.3947  | 47.0963  |        46.7406         |
|       ElectraForQuestionAnswering       | 64  | 5.5526  |  10.8278  | 45.3685  |        45.1207         |
|    LayoutLMForSequenceClassification    | 16  | 5.7463  |  11.1899  | 45.1935  |        45.4779         |
|        BertForQuestionAnswering         | 16  | 5.2542  |  10.9251  | 40.3708  |        37.5659         |
|             BertForMaskedLM             | 16  | 5.2524  |  10.7221  | 40.1962  |         37.754         |
|            MBartForCausalLM             |  4  | 5.8231  |  11.0067  | 39.9038  |        40.7269         |
|           LayoutLMForMaskedLM           | 16  | 5.9009  |  11.3304  | 38.9831  |        40.3795         |
|             BartForCausalLM             |  4  | 5.8488  |  10.9345  | 38.3872  |        38.5205         |
|                CamemBert                | 16  | 5.4198  |  10.8268  | 37.9363  |         38.349         |
|            AlbertForMaskedLM            |  4  | 2.3853  |   8.081   | 37.7724  |        37.1212         |
|             OPTForCausalLM              |  2  | 4.8173  |  10.2492  | 37.5027  |        35.9036         |
|           PegasusForCausalLM            | 32  | 5.8014  |  11.0591  | 36.6349  |        36.2359         |
|     DistilBertForQuestionAnswering      | 256 | 2.6598  |   5.31    | 36.5583  |        35.8004         |
|           RobertaForCausalLM            | 16  |  5.496  |  10.8959  | 36.5007  |        36.1119         |
|      GPT2ForSequenceClassification      |  4  | 4.8569  |  9.8898   | 36.2828  |        35.3576         |
|            TrOCRForCausalLM             | 32  | 5.7617  |  10.9735  | 35.9883  |        36.2359         |
|       RobertaForQuestionAnswering       | 16  | 5.3378  |  10.6733  | 35.1586  |        35.1835         |
|          DistilBertForMaskedLM          | 128 | 2.6654  |  5.4913   | 34.6959  |        34.3502         |
|       AlbertForQuestionAnswering        |  4  | 2.5073  |  8.0886   | 33.5946  |        33.5318         |
|       BlenderbotSmallForCausalLM        | 64  | 3.9924  |  7.4379   | 29.4209  |         28.77          |
|               DistillGPT2               | 16  |  2.555  |  5.0591   | 27.7357  |        28.1912         |
|         Speech2Text2ForCausalLM         | 256 | 3.1463  |  5.8724   | 24.9029  |        24.4282         |
|            PLBartForCausalLM            |  8  | 3.1629  |  5.8821   | 24.7288  |        24.9665         |
|          BlenderbotForCausalLM          |  4  | 11.4193 |  21.6538  |   nan    |        67.1742         |
|          AllenaiLongformerBase          |  4  | 10.1492 |  31.051   |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.094   |         1.1346         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|            YituTechConvBert             | 16  | 0.9999 |  0.9143   |  1.0402  |         1.0411         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9731  |         0.9739         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9649  |         1.052          |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9281  |         0.9912         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9138  |         0.9886         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9137  |         0.9749         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0018         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.893   |         0.9864         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8836  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.8689  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8184  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.789   |         0.8779         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7651  |         0.9908         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7117  |         0.9792         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9763   |  0.487   |         0.9797         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1527         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8684   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.0099 | 300.4984  | 162.8644 |        163.676         |
|       AlbertForQuestionAnswering        |  4  | 263.9152 | 297.8624  | 160.3915 |        161.2058        |
|            XLNetLMHeadModel             |  8  | 281.9189 | 290.2773  | 153.3157 |         154.14         |
|      DebertaV2ForQuestionAnswering      |  2  | 167.9443 | 198.7041  | 129.8103 |        156.665         |
|          DebertaV2ForMaskedLM           |  1  | 153.0852 | 194.0169  | 121.1066 |        152.5133        |
|     PegasusForConditionalGeneration     | 32  | 144.2911 | 147.5148  | 113.4685 |        108.6643        |
|            TrOCRForCausalLM             | 32  |  138.84  | 143.7296  | 109.4922 |        107.3999        |
|      MBartForConditionalGeneration      |  2  | 138.3175 | 142.2038  | 95.2888  |        99.2642         |
|      BartForConditionalGeneration       |  2  | 150.4546 | 141.9339  | 94.5354  |        92.9937         |
|    MegatronBertForQuestionAnswering     |  8  | 144.5947 | 147.5121  | 88.2675  |         87.245         |
|            YituTechConvBert             | 16  | 127.7757 | 130.5974  | 83.0335  |        83.7673         |
| BlenderbotSmallForConditionalGeneration | 64  | 120.0305 | 123.5662  | 82.2311  |        79.5877         |
|     MobileBertForQuestionAnswering      | 128 | 177.6087 | 208.3629  | 80.8864  |        156.6183        |
|                CamemBert                | 16  | 119.9074 | 122.7489  |  76.585  |        77.3274         |
|     M2M100ForConditionalGeneration      | 16  | 153.2763 | 136.2828  | 75.3677  |        78.6775         |
|            MBartForCausalLM             |  4  | 115.2007 | 118.8052  | 75.0198  |        73.5598         |
|             BartForCausalLM             |  4  | 117.4089 | 118.4463  | 74.6105  |         73.734         |
|     PLBartForConditionalGeneration      |  4  | 123.5919 | 124.0964  | 73.2678  |        71.2347         |
|          MobileBertForMaskedLM          | 64  | 180.6636 | 210.3377  | 72.7014  |        157.5985        |
|           LayoutLMForMaskedLM           | 16  | 114.2203 | 117.0041  | 71.8617  |        70.5614         |
|     DistilBertForQuestionAnswering      | 256 | 104.1485 | 104.4306  | 71.7963  |        71.7658         |
|       DebertaForQuestionAnswering       |  8  | 94.6796  | 107.9542  | 70.9154  |        79.5498         |
|            PLBartForCausalLM            |  8  | 115.0147 | 121.1154  | 70.4934  |        68.7715         |
|          DistilBertForMaskedLM          | 128 | 85.4534  |  89.8205  | 70.1755  |        68.5995         |
|           RobertaForCausalLM            | 16  | 116.5778 | 119.2071  | 69.0783  |        68.9418         |
|             BertForMaskedLM             | 16  | 111.6235 | 114.1925  | 68.9901  |         69.384         |
|             OPTForCausalLM              |  2  | 169.979  | 181.2412  |  68.829  |        68.6514         |
|           DebertaForMaskedLM            |  4  | 84.5853  | 108.6643  | 65.9327  |         75.85          |
|       T5ForConditionalGeneration        |  4  | 107.042  | 123.0561  | 64.2634  |        60.4376         |
|                 T5Small                 |  4  | 107.3796 | 123.0267  | 64.1902  |        60.5684         |
|               DistillGPT2               | 16  | 107.0697 | 110.6124  | 63.7245  |        62.1226         |
|         MegatronBertForCausalLM         |  4  | 88.5692  |  95.0476  | 59.3909  |        58.3086         |
|           PegasusForCausalLM            | 32  | 77.7754  |  74.7466  | 58.7554  |        58.2956         |
|             XGLMForCausalLM             |  8  | 119.9961 | 105.6041  | 55.2142  |        80.1664         |
|    LayoutLMForSequenceClassification    | 16  | 99.1541  | 100.5467  | 54.5355  |        55.1529         |
|       ElectraForQuestionAnswering       | 64  | 118.1032 |  118.335  | 54.2477  |        55.1997         |
|        BertForQuestionAnswering         | 16  | 96.7715  |  98.0092  | 53.6764  |        53.9419         |
|       RobertaForQuestionAnswering       | 16  | 96.9795  |  98.3272  | 53.4156  |         54.019         |
|           ElectraForCausalLM            | 32  | 90.0398  |  95.1034  | 47.7577  |        48.3682         |
|       BlenderbotSmallForCausalLM        | 64  | 61.6628  |  63.4346  | 47.4538  |         45.849         |
|       MT5ForConditionalGeneration       | 16  | 104.1155 | 109.6472  | 44.1215  |        54.4244         |
|      GPT2ForSequenceClassification      |  4  | 93.8378  |  98.5599  | 40.6027  |        40.0316         |
|         Speech2Text2ForCausalLM         | 256 | 55.3615  |  57.8877  | 35.2043  |        34.3838         |
|          BlenderbotForCausalLM          |  4  | 122.8939 |  128.181  |   nan    |        87.7364         |
|          AllenaiLongformerBase          |  4  | 181.3453 | 271.2508  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9983 |  0.9977   |  3.0125  |         2.9715         |
|        twins_pcpvt_base         | 64  | 0.9977 |  0.9089   |  1.9872  |         1.6655         |
|      xcit_large_24_p8_224       |  5  | 0.9924 |  0.8632   |  1.9819  |          1.56          |
|         coat_lite_mini          | 128 | 0.9973 |  0.9949   |  1.9443  |         1.9195         |
|          ghostnet_100           | 128 | 0.992  |  0.7653   |  1.8456  |         1.6042         |
|          gmlp_s16_224           | 128 | 0.9947 |  1.0826   |  1.8429  |         1.8309         |
|          gmixer_24_224          | 128 | 0.9948 |  0.8893   |  1.7591  |         1.7493         |
|           volo_d1_224           | 64  | 0.994  |  0.9729   |  1.6872  |         1.6662         |
|            lcnet_050            | 128 | 0.9391 |  0.7341   |  1.6833  |         1.4493         |
|         crossvit_9_240          | 128 | 0.9906 |  0.7824   |  1.6364  |         1.617          |
|  swin_base_patch4_window7_224   | 64  | 0.9911 |  0.9546   |  1.6177  |         1.6033         |
|           convit_base           | 64  | 0.9982 |  0.9976   |  1.6139  |         1.6119         |
|       gluon_inception_v3        | 128 | 0.9964 |  0.8655   |  1.532   |         1.5192         |
|          inception_v3           | 128 | 0.9967 |   0.864   |  1.5298  |         1.5182         |
|        adv_inception_v3         | 128 | 0.9963 |  0.8608   |  1.5273  |         1.519          |
|             dla102              | 128 | 0.996  |  0.8157   |  1.5247  |         1.5212         |
|        sebotnet33ts_256         | 64  | 0.9572 |   0.765   |  1.5089  |         1.5362         |
|            nfnet_l0             | 128 | 0.9894 |   0.814   |  1.4921  |         1.4353         |
|          convnext_base          | 64  | 0.9834 |  0.9847   |  1.4849  |         1.4717         |
|           dm_nfnet_f0           | 128 | 0.9868 |  0.9854   |  1.4762  |         1.4279         |
|       eca_botnext26ts_256       | 128 | 0.9732 |  0.7193   |  1.4414  |         1.4246         |
|      mobilenetv3_large_100      | 128 | 0.9499 |  0.7593   |  1.4381  |         1.4444         |
|           mnasnet_100           | 128 | 0.948  |  0.7411   |  1.4367  |         1.4953         |
|            pit_b_224            | 64  | 0.9946 |  0.9925   |  1.4348  |         1.4295         |
|           mobilevit_s           | 64  | 0.9618 |  0.7187   |  1.429   |         1.4414         |
|           resnest101e           | 64  | 0.9945 |  0.8649   |  1.4217  |         1.3537         |
|           selecsls42b           | 128 | 0.9987 |  0.8122   |  1.4116  |         1.4098         |
|          botnet26t_256          | 128 | 0.9725 |  0.8504   |  1.4076  |         1.4183         |
|           regnety_002           | 128 | 0.9528 |  0.7177   |  1.4008  |         1.2289         |
|         mobilenetv2_100         | 128 | 0.9493 |  0.7369   |  1.384   |         1.4473         |
|        res2net50_14w_8s         | 128 | 0.999  |  0.7908   |  1.3828  |         1.3569         |
|           res2next50            | 128 | 0.9986 |  0.8254   |  1.3709  |         1.3641         |
|          jx_nest_base           | 32  | 0.9874 |  0.9848   |  1.3666  |         1.3573         |
|          mixer_b16_224          | 128 | 0.9975 |  1.0149   |  1.3621  |         1.3596         |
|            hrnet_w18            | 128 | 0.9922 |  0.6443   |  1.3566  |         1.3488         |
|       tf_efficientnet_b0        | 128 | 0.9594 |   0.681   |  1.3533  |         1.383          |
|        ese_vovnet19b_dw         | 128 | 0.9572 |  0.8322   |  1.3531  |         1.3732         |
|          spnasnet_100           | 128 | 0.941  |  0.7399   |  1.3523  |         1.4187         |
|      beit_base_patch16_224      | 64  | 0.9958 |  0.9669   |  1.3495  |         1.3532         |
|          cait_m36_384           |  4  | 0.9948 |  0.9932   |  1.3487  |         1.347          |
|           fbnetc_100            | 128 | 0.949  |  0.7383   |  1.3469  |         1.4031         |
|         poolformer_m36          | 64  | 0.9864 |  0.9835   |  1.327   |         1.3188         |
|            fbnetv3_b            | 128 | 0.9492 |   0.769   |  1.3072  |         1.3309         |
|           rexnet_100            | 128 | 0.9522 |   0.702   |  1.3009  |         1.3333         |
|          resmlp_12_224          | 128 | 0.9928 |  0.8883   |  1.2605  |         1.2573         |
| deit_base_distilled_patch16_224 | 64  | 0.9966 |  0.9936   |  1.2551  |         1.2563         |
|      vit_base_patch16_224       | 64  | 0.996  |  0.9935   |  1.2339  |         1.2363         |
|            tinynet_a            | 128 | 0.9461 |   0.678   |  1.2292  |          1.24          |
|          cspdarknet53           | 64  | 0.9323 |  0.7859   |  1.2071  |         1.2623         |
|           tf_mixnet_l           | 128 | 0.9766 |  0.8268   |  1.186   |         1.1927         |
|            mixnet_l             | 128 | 0.9764 |  0.8208   |  1.1756  |         1.1815         |
|         visformer_small         | 128 | 0.9959 |  0.9447   |  1.1734  |         1.1656         |
|        res2net101_26w_4s        | 64  | 0.9993 |  0.7951   |  1.1486  |         1.0963         |
|          pnasnet5large          | 16  | 0.986  |  0.9172   |  1.1145  |         1.1294         |
|             dpn107              | 32  | 0.9325 |  0.8069   |  1.0913  |         1.1329         |
|            repvgg_a2            | 128 | 0.9358 |  0.7557   |  1.0832  |         1.1188         |
|        gluon_xception65         | 32  | 0.9922 |   0.842   |  1.0746  |         1.0793         |
|     swsl_resnext101_32x16d      | 32  | 0.9977 |  0.8405   |  1.0591  |         1.0232         |
|            gernet_l             | 128 | 0.9344 |  0.7936   |  1.0349  |         1.0667         |
|        convmixer_768_32         | 32  | 0.9985 |  0.9637   |  1.002   |         1.0021         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.6309  |  11.2178  | 293.7417 |        295.4693        |
|            hrnet_w18            | 128 | 9.5813  |  35.9527  | 255.8613 |        245.6165        |
|          ghostnet_100           | 128 | 7.5212  |  14.9287  | 235.6543 |        237.3688        |
|            fbnetv3_b            | 128 |  8.491  |  16.9877  | 174.5159 |        175.5957        |
|           resnest101e           | 64  | 11.1742 |  24.9787  | 168.9761 |        166.8995        |
|          pnasnet5large          | 16  |  8.187  |  26.2457  | 165.0935 |        160.428         |
|      mobilenetv3_large_100      | 128 | 4.2241  |  8.4715   | 164.9841 |        163.0381        |
|            tinynet_a            | 128 | 5.9516  |  12.2135  | 162.414  |        158.6807        |
|           mobilevit_s           | 64  | 5.2973  |  11.8554  | 162.3946 |        161.1337        |
|           tf_mixnet_l           | 128 | 9.0146  |  17.0434  |  162.16  |        155.4438        |
|            mixnet_l             | 128 | 8.4551  |  16.4085  | 160.7966 |        161.1297        |
|          inception_v3           | 128 | 5.6977  |  12.5751  | 160.6346 |        160.2494        |
|        adv_inception_v3         | 128 | 5.6891  |  12.4997  | 158.6132 |        157.5116        |
|       tf_efficientnet_b0        | 128 | 5.0926  |  10.5311  | 158.5465 |        149.1241        |
|       gluon_inception_v3        | 128 | 5.7047  |  12.3605  | 156.6087 |        160.2057        |
|        res2net101_26w_4s        | 64  | 10.5598 |  24.9969  | 153.7283 |        150.5969        |
|        twins_pcpvt_base         | 64  | 10.5133 |  23.7024  | 149.3033 |        147.6075        |
|           fbnetc_100            | 128 | 5.3016  |  9.5445   | 140.3762 |        133.0222        |
|          spnasnet_100           | 128 |  4.917  |  9.2873   | 134.495  |        137.3888        |
|      xcit_large_24_p8_224       |  5  | 12.6165 |  28.424   | 131.9644 |        131.6058        |
|         mobilenetv2_100         | 128 | 3.9834  |  7.9705   | 130.2853 |        134.5086        |
|           mnasnet_100           | 128 | 4.0141  |  7.7126   | 124.9001 |        120.0858        |
|        res2net50_14w_8s         | 128 | 8.9293  |  22.6268  | 122.4246 |        123.7988        |
|          cait_m36_384           |  4  | 13.5353 |  30.5642  | 114.7189 |        113.1373        |
|        sebotnet33ts_256         | 64  | 4.0903  |  8.8217   | 108.7777 |        107.8496        |
|  swin_base_patch4_window7_224   | 64  |  8.565  |  19.2025  | 106.4392 |        110.1255        |
|           regnety_002           | 128 | 4.8807  |  8.9585   | 105.1908 |        107.722         |
|             dpn107              | 32  | 9.7604  |  19.7795  | 100.4431 |        99.7149         |
|         poolformer_m36          | 64  | 7.5923  |  13.865   | 99.9614  |        100.878         |
|       eca_botnext26ts_256       | 128 |  3.328  |  6.7768   | 99.6719  |        97.8099         |
|          cspdarknet53           | 64  | 5.7779  |  10.8904  | 97.8372  |        102.273         |
|            lcnet_050            | 128 | 2.5576  |   5.035   | 97.6989  |        96.1842         |
|             dla102              | 128 |  6.267  |  14.144   | 97.4817  |        97.0889         |
|        gluon_xception65         | 32  | 7.8069  |  16.9435  | 95.8131  |        93.3756         |
|           selecsls42b           | 128 | 2.4274  |  5.3683   | 93.1197  |        90.3491         |
|          botnet26t_256          | 128 | 2.9709  |   5.99    | 91.8323  |         90.559         |
|         coat_lite_mini          | 128 | 3.3161  |  8.0507   | 89.0325  |        88.2171         |
|           res2next50            | 128 | 5.0166  |  12.0216  | 88.4455  |        88.0902         |
|         crossvit_9_240          | 128 | 5.7936  |  13.274   | 87.4267  |        86.4851         |
|          jx_nest_base           | 32  | 6.7096  |  14.9695  | 83.3853  |        83.6396         |
|            gernet_l             | 128 | 4.9898  |  8.9023   | 81.3948  |        79.2313         |
|            nfnet_l0             | 128 | 5.2966  |  10.8862  | 77.8551  |        77.8671         |
|        ese_vovnet19b_dw         | 128 | 2.6967  |  4.5357   | 75.7232  |        76.5903         |
|           volo_d1_224           | 64  | 5.0788  |  11.8237  | 74.5273  |         73.153         |
|           dm_nfnet_f0           | 128 | 6.1523  |  11.6593  | 74.1638  |        74.3692         |
|        tnt_s_patch16_224        | 128 | 6.4283  |  16.0584  | 69.2625  |        69.4188         |
|         visformer_small         | 128 | 2.5708  |  6.0805   |  65.866  |        66.4379         |
|     swsl_resnext101_32x16d      | 32  | 6.2525  |  13.8162  | 62.2939  |        62.0302         |
|            repvgg_a2            | 128 | 4.8351  |  8.7725   | 60.3307  |        60.4433         |
|          gmlp_s16_224           | 128 | 5.6276  |  12.0807  | 60.0239  |        59.1223         |
|          convnext_base          | 64  | 6.6394  |  12.6884  | 58.9399  |        57.8177         |
|          gmixer_24_224          | 128 | 5.7211  |  12.7645  | 51.4235  |        50.8588         |
|           convit_base           | 64  | 3.4897  |  8.6433   | 47.6784  |         47.722         |
|            pit_b_224            | 64  | 3.4264  |  7.9815   | 45.0275  |        44.4521         |
|      vit_base_patch16_224       | 64  | 3.0278  |  6.8909   | 41.9796  |        40.5128         |
|          resmlp_12_224          | 128 | 2.8405  |  5.4627   | 39.5544  |        39.6609         |
| deit_base_distilled_patch16_224 | 64  | 3.1044  |  7.0213   | 39.5515  |        41.8251         |
|        convmixer_768_32         | 32  | 1.6867  |  6.9187   | 37.1271  |        36.4053         |
|      beit_base_patch16_224      | 64  | 3.8917  |   8.574   | 36.6269  |        35.7859         |
|          mixer_b16_224          | 128 |  2.834  |  5.8959   | 33.1515  |        32.4604         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9634 |  0.9155   |  0.9536  |         1.0325         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8277   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.7414 | 311.4911  | 299.3598 |        299.7481        |
|            hrnet_w18            | 128 | 280.7873 | 432.2822  | 205.4256 |        206.8531        |
|          pnasnet5large          | 16  | 198.5186 |  213.718  | 175.8984 |        173.9258        |
|           tf_mixnet_l           | 128 | 193.6661 | 229.0964  | 159.4516 |        158.5842        |
|            mixnet_l             | 128 | 185.2262 | 220.7729  | 153.9323 |        153.137         |
|          cait_m36_384           |  4  | 167.945  | 168.0763  | 123.466  |        124.067         |
|           resnest101e           | 64  | 164.4221 | 189.8496  | 114.7017 |        121.2228        |
|             dla102              | 128 | 172.2938 | 210.4008  | 112.6188 |        112.9276        |
|     swsl_resnext101_32x16d      | 32  | 118.6484 |  140.979  | 111.7566 |        115.622         |
|         poolformer_m36          | 64  | 146.6379 | 147.1524  | 108.9972 |        109.6766        |
|        tnt_s_patch16_224        | 128 | 323.1999 | 323.4823  | 107.1626 |        108.5177        |
|        adv_inception_v3         | 128 | 160.7155 | 185.7874  | 104.8363 |        105.4107        |
|          inception_v3           | 128 | 160.6788 | 185.1951  | 104.6562 |        105.4844        |
|       gluon_inception_v3        | 128 | 160.6775 | 185.0746  | 104.6079 |        105.5247        |
|        res2net50_14w_8s         | 128 | 140.7981 | 177.8606  | 101.5972 |        103.7326        |
|           convit_base           | 64  | 163.0442 |  163.02   | 100.8891 |        100.9842        |
|             dpn107              | 32  | 113.6753 | 131.2356  | 97.1043  |        93.5754         |
|        gluon_xception65         | 32  | 99.5416  | 117.2678  | 92.0877  |        91.5765         |
|           res2next50            | 128 | 126.0728 | 152.5131  | 91.8365  |        92.1529         |
|  swin_base_patch4_window7_224   | 64  | 147.3951 | 152.9507  | 90.3459  |        91.3262         |
|          mixer_b16_224          | 128 | 116.4902 | 114.7495  | 85.8138  |        86.0962         |
|           dm_nfnet_f0           | 128 | 128.1815 | 128.7521  |  85.778  |        88.6139         |
|        res2net101_26w_4s        | 64  | 99.5006  | 125.1797  | 85.0556  |        90.0775         |
|            fbnetv3_b            | 128 | 115.4503 | 142.0554  | 83.7157  |         82.247         |
|            pit_b_224            | 64  | 118.8066 | 118.8698  | 82.2115  |         82.506         |
|          convnext_base          | 64  | 124.4859 |  124.002  | 82.1996  |        82.9729         |
|         visformer_small         | 128 |  91.219  |  96.3033  | 77.4617  |        78.0442         |
|            nfnet_l0             | 128 | 112.7689 | 137.5489  | 75.1779  |        77.8266         |
|      beit_base_patch16_224      | 64  | 101.4484 | 104.4132  | 75.0981  |        74.6584         |
|          gmlp_s16_224           | 128 | 137.4107 | 126.4035  | 74.2918  |        74.6414         |
|       eca_botnext26ts_256       | 128 | 108.8526 |  147.347  | 73.5667  |        74.2155         |
|          cspdarknet53           | 64  | 94.9301  | 112.6775  | 73.3824  |        70.1594         |
|          jx_nest_base           | 32  | 101.8135 | 101.6351  | 73.3602  |        73.8755         |
|           volo_d1_224           | 64  | 120.8435 | 123.4047  | 71.2614  |        72.0556         |
|          botnet26t_256          | 128 | 101.8629 | 116.6689  | 70.4766  |        69.9641         |
|      vit_base_patch16_224       | 64  | 86.8154  |  87.0802  |  70.237  |        69.9325         |
|            gernet_l             | 128 | 77.7222  |  91.5422  |  70.208  |         68.132         |
| deit_base_distilled_patch16_224 | 64  | 84.8024  |  85.0355  | 67.3442  |        67.2703         |
|            repvgg_a2            | 128 | 77.6165  |  96.0032  | 67.0624  |        64.9239         |
|          gmixer_24_224          | 128 | 117.8255 | 131.8889  | 66.8207  |        67.1912         |
|      xcit_large_24_p8_224       |  5  | 128.4518 | 140.5964  | 62.3074  |        80.7784         |
|        twins_pcpvt_base         | 64  | 118.3652 | 128.9692  | 60.0834  |        69.0127         |
|       tf_efficientnet_b0        | 128 | 84.8819  | 119.8132  | 60.0816  |        58.8016         |
|           rexnet_100            | 128 | 80.0861  | 108.3608  | 58.4793  |        57.0988         |
|           fbnetc_100            | 128 | 82.9623  | 106.7055  | 58.3523  |        56.0743         |
|         coat_lite_mini          | 128 | 113.0425 | 113.3778  | 57.9762  |        58.6401         |
|           mobilevit_s           | 64  | 84.6666  | 113.5124  | 56.9084  |        56.3632         |
|            tinynet_a            | 128 | 73.4465  | 102.6024  | 56.6624  |        56.0991         |
|        sebotnet33ts_256         | 64  |  80.462  | 100.5128  | 51.0731  |        50.0884         |
|         crossvit_9_240          | 128 | 82.5206  | 104.1347  | 49.9386  |        50.4351         |
|          spnasnet_100           | 128 | 70.4451  |  89.5367  | 49.0228  |        46.6909         |
|          ghostnet_100           | 128 | 90.6571  | 117.1867  | 48.6666  |        55.9871         |
|        ese_vovnet19b_dw         | 128 | 64.6189  |  74.4427  | 45.7417  |        45.0158         |
|         mobilenetv2_100         | 128 | 65.5317  |  84.4233  | 44.9116  |        42.9536         |
|           selecsls42b           | 128 | 60.0268  |  73.7286  | 42.4133  |        42.5388         |
|           mnasnet_100           | 128 | 64.2526  |  82.1058  | 42.4017  |        40.7661         |
|          resmlp_12_224          | 128 | 53.6101  |  59.8836  | 42.1034  |        42.2311         |
|      mobilenetv3_large_100      | 128 | 61.1815  |  76.7225  | 40.4238  |        40.3125         |
|           regnety_002           | 128 | 40.0778  |  53.6278  | 26.5186  |        29.5308         |
|            lcnet_050            | 128 | 31.7811  |  40.6824  | 17.7076  |        20.5934         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_094_04_04_23_performance_amp_473

Commit hashes

pytorch commit: 6887333
pytorch commit date: 2023-04-05 01:46:20+00:00
torchbench commit: 75193ef25a5b998e0a3daa70e2d3ce9dc7a0c000
torchbench commit date: 2023-04-04 18:50:11-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git6887333

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.60x    |    1.58x    |    1.41x    |
| inductor_no_cudagraphs |   1.27x    |    1.50x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.85    |    7.24     |    5.96     |
|       aot_eager        |    9.33    |    15.84    |    13.27    |
|        inductor        |   62.51    |    61.09    |   109.97    |
| inductor_no_cudagraphs |   62.79    |    58.37    |   109.00    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.80x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Previous report name: /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 85%, 51/60  | 85%, 51/60  |
|        inductor        | huggingface | 91%, 41/45  | 91%, 41/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.57x    |   1.60x   |
|        inductor        | huggingface |   1.59x    |   1.58x   |
|        inductor        | timm_models |   1.41x    |   1.41x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.27x   |
| inductor_no_cudagraphs | huggingface |   1.50x    |   1.50x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |         hf_Longformer         |   fail_to_run   |      fail_to_run       |
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |        vision_maskrcnn        | eager_variation |    eager_variation     |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |              gat              |     0.0000      |         0.0000         |
| torchbench  |              gcn              |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |             sage              |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |          pass          |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |             dcgan             |  1.5279  |         0.8229         |
| torchbench  |         lennard_jones         |  1.3915  |         0.8625         |
| torchbench  |       soft_actor_critic       |  1.1453  |         0.7887         |
| torchbench  |          tts_angular          |  0.945   |         0.9462         |
| torchbench  |          timm_vovnet          |   0.94   |         0.9227         |
| torchbench  |    nvidia_deeprecommender     |  0.8718  |         1.0183         |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0815         |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  0.945   |         0.8095         |
| huggingface |     DebertaV2ForMaskedLM      |  0.8658  |         0.656          |
| huggingface | DebertaV2ForQuestionAnswering |  0.8179  |         0.6877         |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.2692         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 174.5453 |        171.8366        |
| torchbench  |        phlippe_densenet        | 169.9413 |        164.0834        |
| torchbench  |           hf_BigBird           | 148.7813 |        127.5042        |
| torchbench  |       timm_efficientnet        | 137.7045 |        144.4091        |
| torchbench  |       mobilenet_v3_large       | 136.8974 |        135.1869        |
| torchbench  |          densenet121           | 134.2996 |        136.3448        |
| torchbench  |          mobilenet_v2          | 129.0517 |        128.6863        |
| torchbench  |             yolov3             | 120.1524 |        114.1959        |
| torchbench  | timm_vision_transformer_large  |   nan    |        126.1707        |
| huggingface |     MobileBertForMaskedLM      | 145.0116 |        142.3565        |
| huggingface | MobileBertForQuestionAnswering | 137.7101 |        136.5337        |
| huggingface |      DebertaV2ForMaskedLM      | 135.7999 |        71.2067         |
| huggingface | DebertaV2ForQuestionAnswering  | 135.4218 |        69.8818         |
| huggingface | M2M100ForConditionalGeneration | 134.3798 |        134.984         |
| huggingface |  MT5ForConditionalGeneration   | 132.6963 |        131.4027        |
| huggingface |        XGLMForCausalLM         | 131.2673 |        131.4624        |
| timm_models |           rexnet_100           | 288.633  |        286.6164        |
| timm_models |           hrnet_w18            | 251.7722 |        251.342         |
| timm_models |          ghostnet_100          | 237.364  |        240.7118        |
| timm_models |           fbnetv3_b            | 171.7907 |        177.0667        |
| timm_models |     mobilenetv3_large_100      | 165.1874 |        159.0224        |
| timm_models |          mobilevit_s           | 163.7446 |        152.6373        |
| timm_models |          resnest101e           | 162.3824 |        167.4093        |
| timm_models |           tinynet_a            | 162.1294 |        160.1799        |
| timm_models |         pnasnet5large          | 160.6701 |        162.1834        |
| timm_models |        adv_inception_v3        | 160.2985 |        162.1355        |
| timm_models |          tf_mixnet_l           | 159.8192 |        153.3491        |
| timm_models |       gluon_inception_v3       | 158.312  |        160.1563        |
| timm_models |            mixnet_l            | 156.5639 |        158.1503        |
| timm_models |          inception_v3          | 156.4453 |        156.8552        |
| timm_models |       res2net101_26w_4s        | 152.0619 |        152.5181        |
| timm_models |       tf_efficientnet_b0       | 149.6682 |         153.73         |
| timm_models |        twins_pcpvt_base        | 148.6353 |        148.1482        |
| timm_models |          spnasnet_100          | 139.1762 |        138.2714        |
| timm_models |           fbnetc_100           | 138.0168 |        132.3115        |
| timm_models |      xcit_large_24_p8_224      | 133.6499 |        132.0053        |
| timm_models |        mobilenetv2_100         | 128.3707 |        131.9352        |
| timm_models |        res2net50_14w_8s        | 125.2948 |        125.9409        |
| timm_models |          mnasnet_100           | 120.5102 |        123.9508        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |              hf_GPT2_large              |  0.8904  |         1.128          |
| torchbench  |                 yolov3                  |  0.8743  |         1.0159         |
| torchbench  |            timm_efficientnet            |  0.8703  |         1.006          |
| torchbench  |           speech_transformer            |  0.8651  |         0.8682         |
| torchbench  |           shufflenet_v2_x1_0            |  0.8628  |         0.9658         |
| torchbench  |              timm_resnest               |  0.8621  |         0.9661         |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.8835         |
| torchbench  |               timm_regnet               |  0.8512  |         0.9536         |
| torchbench  |                resnet152                |  0.8489  |         0.9403         |
| torchbench  |           Background_Matting            |  0.8485  |         1.0406         |
| torchbench  |              hf_DistilBert              |  0.8476  |         0.9945         |
| torchbench  |               hf_T5_large               |  0.8201  |         1.168          |
| torchbench  |              pytorch_unet               |  0.8134  |         0.9308         |
| torchbench  |            phlippe_densenet             |  0.8058  |         0.8659         |
| torchbench  |                 hf_Bart                 |  0.7933  |         0.9173         |
| torchbench  |           mobilenet_v3_large            |  0.7856  |         0.872          |
| torchbench  |                  dcgan                  |  0.7821  |         0.9645         |
| torchbench  |                resnet50                 |  0.782   |         0.8844         |
| torchbench  |                 demucs                  |  0.7733  |         0.9656         |
| torchbench  |              squeezenet1_1              |  0.773   |         0.9087         |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.8893         |
| torchbench  |               timm_vovnet               |  0.7529  |         0.8869         |
| torchbench  |               mnasnet1_0                |  0.7448  |         0.8074         |
| torchbench  |             pytorch_struct              |  0.7277  |         0.7362         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9808         |
| torchbench  |                 alexnet                 |  0.7091  |         0.939          |
| torchbench  |               densenet121               |  0.7071  |         0.7927         |
| torchbench  |               hf_BigBird                |  0.6968  |         1.1134         |
| torchbench  |             resnext50_32x4d             |  0.6659  |         0.772          |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.8931         |
| torchbench  |                   drq                   |  0.6379  |         0.9573         |
| torchbench  |            soft_actor_critic            |  0.6066  |         0.9973         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.7463         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.5904  |         0.6008         |
| torchbench  |                resnet18                 |  0.5395  |         0.6097         |
| torchbench  |              lennard_jones              |  0.5317  |         0.9997         |
| torchbench  |               hf_Reformer               |  0.4538  |         0.8022         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3169  |         0.3395         |
| huggingface |           PegasusForCausalLM            |  0.893   |         0.9864         |
| huggingface |          DistilBertForMaskedLM          |  0.8849  |         0.9624         |
| huggingface |            TrOCRForCausalLM             |  0.8836  |         0.9583         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8729  |         0.9803         |
| huggingface |     PegasusForConditionalGeneration     |  0.8689  |         1.0689         |
| huggingface |      MBartForConditionalGeneration      |  0.8672  |         1.0307         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         1.0139         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         1.0962         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8184  |         0.9119         |
| huggingface |         Speech2Text2ForCausalLM         |  0.789   |         0.8779         |
| huggingface |     M2M100ForConditionalGeneration      |  0.7651  |         0.9908         |
| huggingface |          MobileBertForMaskedLM          |  0.7473  |         1.016          |
| huggingface |             XGLMForCausalLM             |  0.7117  |         0.9792         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6569  |         0.8392         |
| huggingface |           DebertaForMaskedLM            |  0.5501  |         0.9978         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5197  |         0.9665         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.487   |         0.9802         |
| huggingface |       DebertaForQuestionAnswering       |  0.4601  |         1.1526         |
| timm_models |                hrnet_w18                |  0.8918  |          0.99          |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1115         |
| timm_models |              inception_v3               |  0.8904  |         1.0171         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0171         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0171         |
| timm_models |                 dpn107                  |  0.8833  |         0.9642         |
| timm_models |            gluon_xception65             |  0.8831  |         0.9705         |
| timm_models |              ghostnet_100               |  0.8807  |         0.977          |
| timm_models |              spnasnet_100               |  0.8786  |         0.9451         |
| timm_models |          mobilenetv3_large_100          |  0.877   |         0.9361         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1871         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0072         |
| timm_models |          xcit_large_24_p8_224           |  0.8721  |         0.9732         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9607         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9483         |
| timm_models |                mixnet_l                 |  0.8687  |         0.9902         |
| timm_models |               mnasnet_100               |  0.8683  |         0.9403         |
| timm_models |               res2next50                |  0.866   |         0.9547         |
| timm_models |              cait_m36_384               |  0.8632  |         0.989          |
| timm_models |               fbnetc_100                |  0.8596  |         0.9535         |
| timm_models |                pit_b_224                |  0.8578  |         1.0242         |
| timm_models |               selecsls42b               |  0.8576  |         0.9664         |
| timm_models |              convnext_base              |  0.8505  |         1.0338         |
| timm_models |                gernet_l                 |  0.8499  |         0.9706         |
| timm_models |         swsl_resnext101_32x16d          |  0.8461  |         0.9786         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0202         |
| timm_models |              botnet26t_256              |  0.8239  |         0.9779         |
| timm_models |                lcnet_050                |  0.805   |         0.884          |
| timm_models |                repvgg_a2                |  0.7738  |         0.9611         |
| timm_models |               regnety_002               |  0.7602  |         0.8966         |
| timm_models |             crossvit_9_240              |  0.7526  |         0.9898         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9045         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9604         |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/passrate_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/memory_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Compilation latency (sec) regressions

+----------+--------+-------------+------------+
| compiler |  name  | prev_status | cur_status |
+----------+--------+-------------+------------+
| inductor | yolov3 |  118.6659   |  120.1524  |
+----------+--------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+-------------------+-------------+------------+
| compiler |       name        | prev_status | cur_status |
+----------+-------------------+-------------+------------+
| inductor | timm_efficientnet |   0.9284    |   0.8703   |
+----------+-------------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Performance speedup regressions

+----------+--------------------+-------------+------------+
| compiler |        name        | prev_status | cur_status |
+----------+--------------------+-------------+------------+
| inductor | DebertaForMaskedLM |   0.9543    |   0.945    |
+----------+--------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9644 |  0.9131   |  3.6976  |          1.35          |
|           BERT_pytorch            |  16  | 0.9902 |  0.7982   |  3.0015  |         2.0807         |
|            hf_BigBird             |  2   | 0.9484 |  0.7752   |  2.9745  |         1.6455         |
|            densenet121            |  4   | 0.985  |  0.7124   |  2.513   |         1.0061         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9663 |   0.896   |  2.4518  |         1.7888         |
|             hf_Albert             |  8   | 0.9937 |  0.9601   |  2.3765  |         2.2921         |
|            hf_T5_large            |  2   | 0.9755 |  0.8073   |  2.2438  |         1.8204         |
|              hf_Bart              |  4   | 0.9731 |  0.8412   |  2.1804  |         1.5733         |
|           squeezenet1_1           |  32  | 0.9797 |  0.9358   |  2.1684  |         1.3118         |
|        mobilenet_v3_large         |  32  | 0.9967 |  0.7807   |  2.0767  |         1.177          |
|         phlippe_densenet          | 128  | 0.9845 |  0.7694   |  2.0716  |         1.0051         |
|              hf_Bert              |  4   | 0.995  |  0.8439   |  2.0066  |         1.5881         |
|               dlrm                | 1024 | 0.9417 |  0.8152   |  1.9701  |         1.1689         |
|               hf_T5               |  8   | 0.9839 |  0.8479   |  1.9116  |         1.981          |
|          phlippe_resnet           | 128  | 0.9762 |  0.7576   |  1.8271  |         1.012          |
|              hf_GPT2              |  4   | 0.9959 |  0.9544   |  1.796   |         1.7891         |
|          resnext50_32x4d          |  8   | 0.9813 |   0.712   |  1.7111  |         0.9793         |
|            mnasnet1_0             |  32  | 0.9843 |  0.7329   |  1.6955  |         1.0657         |
|           hf_GPT2_large           |  4   | 0.9828 |  0.9714   |  1.674   |         1.7468         |
|        shufflenet_v2_x1_0         | 128  | 0.9919 |  0.7556   |  1.637   |          1.19          |
|        speech_transformer         |  32  | 0.9808 |  0.7931   |  1.6231  |         1.6446         |
|             resnet18              |  16  | 0.9848 |  0.7493   |  1.6166  |         0.9663         |
|           hf_Bert_large           |  4   | 0.9963 |  0.8602   |  1.6027  |         1.5483         |
|           fastNLP_Bert            |  6   | 0.9961 |  0.8477   |  1.5809  |         1.4918         |
|           timm_resnest            |  32  | 0.9924 |  0.8505   |  1.5665  |         1.4918         |
|      timm_vision_transformer      |  32  | 0.9832 |  0.8512   |  1.5592  |         1.4034         |
|         timm_efficientnet         |  32  | 0.9368 |  0.6205   |  1.5473  |         1.0684         |
|            timm_nfnet             | 128  | 0.9862 |  0.9853   |  1.5404  |         1.4755         |
|                drq                |  1   | 0.958  |  0.7142   |  1.5359  |         1.057          |
|               dcgan               |  32  | 0.868  |  0.6909   |  1.5279  |         0.8229         |
|           mobilenet_v2            |  96  | 0.997  |  0.7776   |  1.5237  |         1.5009         |
| attention_is_all_you_need_pytorch | 256  | 0.9877 |  0.8394   |  1.5004  |         1.4902         |
|           hf_DistilBert           |  8   | 0.9771 |  0.9444   |  1.4624  |         1.4751         |
|          pytorch_struct           | 200  | 0.9094 |  0.7806   |  1.4235  |         1.1013         |
|           lennard_jones           | 1000 | 0.8199 |  0.7356   |  1.3915  |         0.8625         |
|          LearningToPaint          |  96  | 0.9868 |  0.7826   |  1.3809  |         1.0578         |
|           pytorch_unet            |  1   | 0.9966 |   0.205   |  1.3572  |         1.3506         |
|          pytorch_stargan          |  16  | 0.9935 |  0.7997   |  1.2996  |         1.2424         |
|               vgg16               |  64  | 0.9994 |  0.9985   |   1.24   |         1.2543         |
|            Super_SloMo            |  6   | 0.9969 |  0.1791   |  1.2327  |         1.2316         |
|        Background_Matting         |  4   | 0.9986 |  0.1371   |  1.2127  |         1.2079         |
|             resnet152             |  32  | 0.9922 |  0.7665   |  1.2007  |         1.004          |
|              yolov3               |  16  | 0.9961 |   0.805   |  1.1967  |         1.1984         |
|             resnet50              |  32  | 0.9957 |  0.7615   |  1.1898  |         1.0568         |
|         soft_actor_critic         | 256  | 0.8717 |  0.6311   |  1.1453  |         0.7887         |
|            hf_Reformer            |  4   | 0.9853 |  0.9616   |  1.1391  |         1.0651         |
|              alexnet              | 128  | 0.9989 |  0.9979   |  1.0874  |         1.1356         |
|              demucs               |  4   | 0.9992 |  1.0019   |  1.0385  |         1.0394         |
|            timm_regnet            |  32  | 0.9177 |  0.7711   |  1.0041  |         1.0078         |
|            tts_angular            |  64  | 0.9074 |  0.8885   |  0.945   |         0.9462         |
|            timm_vovnet            |  32  | 0.8536 |  0.7093   |   0.94   |         0.9227         |
|      nvidia_deeprecommender       | 256  | 0.9987 |  0.9986   |  0.8718  |         1.0183         |
|   timm_vision_transformer_large   |  32  | 0.9981 |    0.0    |   0.0    |         1.0815         |
|           hf_Longformer           |  2   | 1.0114 |  0.6893   |   0.0    |          0.0           |
|               moco                |  32  | 0.9374 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 27.3687 |  55.9419  | 174.5453 |        171.8366        |
|         phlippe_densenet          | 128  | 3.2342  |  7.0234   | 169.9413 |        164.0834        |
|            hf_BigBird             |  2   | 13.1862 |  37.4846  | 148.7813 |        127.5042        |
|         timm_efficientnet         |  32  | 5.0486  |  10.1227  | 137.7045 |        144.4091        |
|        mobilenet_v3_large         |  32  | 3.4932  |  7.6451   | 136.8974 |        135.1869        |
|            densenet121            |  4   | 7.6899  |  18.2176  | 134.2996 |        136.3448        |
|           mobilenet_v2            |  96  | 3.2048  |  7.0112   | 129.0517 |        128.6863        |
|              yolov3               |  16  | 5.1919  |  10.6893  | 120.1524 |        114.1959        |
|            mnasnet1_0             |  32  | 3.2002  |  6.7461   | 109.1944 |        106.7632        |
|           hf_GPT2_large           |  4   | 14.9348 |  30.1928  | 106.8115 |        105.3042        |
|             resnet152             |  32  | 9.1821  |  20.1665  | 106.4053 |        105.2675        |
|           timm_resnest            |  32  | 1.8288  |  3.9185   | 96.2524  |        100.1662        |
|        shufflenet_v2_x1_0         | 128  | 3.4411  |   7.674   | 82.2708  |        81.3401         |
|        speech_transformer         |  32  | 6.0851  |  13.6741  | 79.3296  |        79.7958         |
| attention_is_all_you_need_pytorch | 256  | 4.4434  |  11.0815  | 74.4601  |        74.9979         |
|            timm_regnet            |  32  | 6.7366  |  12.2784  | 73.8364  |        71.8154         |
|            timm_nfnet             | 128  | 5.9243  |  10.9885  | 72.6075  |        71.4288         |
|        Background_Matting         |  4   | 3.3407  |  11.4437  | 69.3236  |        66.9375         |
|           BERT_pytorch            |  16  | 4.9098  |  11.5895  | 68.8892  |        68.9895         |
|             resnet50              |  32  | 3.2153  |  7.0204   |  66.742  |        65.8697         |
|           hf_Bert_large           |  4   | 10.2547 |  21.3732  | 64.4932  |        62.2446         |
|            timm_vovnet            |  32  | 3.6865  |  6.3668   |  61.401  |        63.3485         |
|           pytorch_unet            |  1   | 1.5418  |  4.6503   | 59.9207  |        59.2972         |
|       functorch_dp_cifar10        |  64  | 1.2133  |  2.4366   |  57.062  |        56.5682         |
|          resnext50_32x4d          |  8   | 3.2553  |  7.0603   | 54.0953  |        52.4136         |
|      timm_vision_transformer      |  32  | 3.3213  |  7.1264   | 51.8332  |        49.8191         |
|               hf_T5               |  8   | 5.8439  |  13.5352  | 50.5615  |        48.2348         |
|           fastNLP_Bert            |  6   | 5.2816  |  11.2484  | 48.8441  |        49.9494         |
|              hf_Bart              |  4   | 6.3762  |  13.8843  | 48.6923  |        48.7841         |
|          pytorch_stargan          |  16  |  1.217  |  3.2071   | 45.9155  |        45.9561         |
|            hf_Reformer            |  4   | 4.2338  |  6.0213   | 45.0174  |        40.2227         |
|             resnet18              |  16  | 1.3477  |  3.0756   | 44.2922  |        44.2944         |
|          LearningToPaint          |  96  | 1.4346  |  2.8978   | 44.0936  |        44.0929         |
|              hf_GPT2              |  4   | 4.8133  |  9.7281   | 42.4394  |         42.003         |
|            Super_SloMo            |  6   | 2.7857  |  9.7995   | 42.2591  |         43.815         |
|             hf_Albert             |  8   | 2.5359  |  8.5624   | 40.6912  |        37.8672         |
|              hf_Bert              |  4   | 5.0785  |  10.547   | 38.9389  |        37.5658         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2317  |  2.9503   | 37.1642  |        36.5859         |
|          phlippe_resnet           | 128  | 1.3451  |  2.8629   | 32.1886  |        32.3211         |
|              demucs               |  4   | 1.4416  |  2.2044   | 29.7416  |        28.9902         |
|           hf_DistilBert           |  8   | 2.4108  |   5.59    | 29.6811  |         29.106         |
|           squeezenet1_1           |  32  | 1.0511  |  1.7406   | 23.9775  |        24.6289         |
|          pytorch_struct           | 200  | 0.7444  |   1.336   | 20.3578  |         20.577         |
|              alexnet              | 128  |  0.495  |  0.7831   | 15.6811  |        15.4736         |
|               vgg16               |  64  | 0.6421  |  1.1159   | 15.2671  |         15.638         |
|      nvidia_deeprecommender       | 256  | 0.4925  |  0.7625   | 10.1057  |        10.9822         |
|                drq                |  1   | 0.6707  |  1.0228   |  9.8043  |         8.885          |
|               dcgan               |  32  | 0.4343  |  0.7503   |  8.7321  |         7.8521         |
|               dlrm                | 1024 | 0.3754  |  0.8033   |  7.6908  |         7.1819         |
|         soft_actor_critic         | 256  | 0.4338  |  0.6029   |  7.1451  |         8.1699         |
|           lennard_jones           | 1000 |  0.397  |  0.6029   |  6.0754  |         5.9605         |
|            tts_angular            |  64  | 0.4531  |   0.508   |  5.8968  |         5.9277         |
|   timm_vision_transformer_large   |  32  | 9.3928  |    nan    |   nan    |        126.1707        |
|           hf_Longformer           |  2   | 9.5495  |  30.7766  |   nan    |          nan           |
|               moco                |  32  | 33.7336 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.2078         |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2557         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9863 |  0.7649   |  1.0104  |         1.1018         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9895  |         0.9983         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|            timm_nfnet             | 128  | 0.9068 |  0.8751   |  0.9685  |         1.0711         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.963  |  0.8353   |  0.9422  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9321  |         1.0713         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8904  |         1.128          |
|              yolov3               |  16  | 0.9837 |  0.8252   |  0.8743  |         1.0159         |
|         timm_efficientnet         |  32  | 0.9863 |  0.8179   |  0.8703  |         1.006          |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.8682         |
|        shufflenet_v2_x1_0         | 128  | 0.9549 |  0.8398   |  0.8628  |         0.9658         |
|           timm_resnest            |  32  | 0.9875 |  0.8966   |  0.8621  |         0.9661         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9913 |  0.8504   |  0.8512  |         0.9536         |
|             resnet152             |  32  | 0.9939 |  0.8937   |  0.8489  |         0.9403         |
|        Background_Matting         |  4   | 1.0125 |  0.6489   |  0.8485  |         1.0406         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9945         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.7933  |         0.9173         |
|        mobilenet_v3_large         |  32  | 0.9832 |  0.8395   |  0.7856  |         0.872          |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9907 |  0.8582   |  0.782   |         0.8844         |
|              demucs               |  4   | 0.9661 |  0.9659   |  0.7733  |         0.9656         |
|           squeezenet1_1           |  32  | 0.9666 |  0.9291   |  0.773   |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|            mnasnet1_0             |  32  | 0.9792 |  0.8641   |  0.7448  |         0.8074         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7277  |         0.7362         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.7091  |         0.939          |
|            densenet121            |  4   | 0.9939 |  0.9823   |  0.7071  |         0.7927         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6968  |         1.1134         |
|          resnext50_32x4d          |  8   | 0.9942 |  0.8409   |  0.6659  |         0.772          |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9202 |   0.711   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8568   |  0.5904  |         0.6008         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.5395  |         0.6097         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 1.0048 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.8206 | 214.9218  | 124.9038 |        120.6207        |
|        Background_Matting         |  4   | 126.0953 | 918.6534  | 103.6879 |        103.9779        |
|            hf_T5_large            |  2   | 231.5107 | 272.2374  | 101.5806 |        123.6663        |
|               hf_T5               |  8   | 182.0095 | 211.6958  | 93.7918  |         90.638         |
|            timm_nfnet             | 128  | 119.7015 | 119.9386  | 76.9304  |        79.9181         |
|            hf_BigBird             |  2   | 205.3691 | 281.1943  | 75.8695  |        113.9897        |
|            hf_Reformer            |  4   | 82.2433  |  84.1848  | 71.1313  |        75.8979         |
|            Super_SloMo            |  6   | 79.7404  | 443.0668  | 64.3345  |        64.5137         |
|              yolov3               |  16  | 68.7255  |  84.8966  | 57.1748  |        57.2392         |
|            timm_regnet            |  32  | 60.6964  |  72.9624  | 56.0096  |        58.4795         |
|               vgg16               |  64  | 66.2391  |  66.2685  | 53.4498  |        52.8167         |
|             resnet152             |  32  | 64.5474  |  81.9574  | 52.7476  |        63.3333         |
|           hf_Bert_large           |  4   | 82.8119  |  96.0568  | 51.8003  |        52.5741         |
|              demucs               |  4   | 53.5884  |  53.8564  | 51.7241  |         51.792         |
|        speech_transformer         |  32  | 65.5252  |  73.2215  | 36.6898  |        38.5332         |
| attention_is_all_you_need_pytorch | 256  | 55.5907  |  68.0979  | 36.1414  |        36.3313         |
|           fastNLP_Bert            |  6   | 53.1654  |  61.5542  |  35.859  |         35.363         |
|              hf_Bart              |  4   | 59.6665  |  81.0395  | 35.8088  |        36.1689         |
|           mobilenet_v2            |  96  | 47.0792  |  60.2975  | 30.7803  |        31.2921         |
|           pytorch_unet            |  1   | 39.9288  | 194.1312  | 29.3114  |         29.488         |
|             hf_Albert             |  8   | 69.8278  |  72.378   | 29.2278  |        29.7113         |
|              hf_GPT2              |  4   |  49.232  |  50.9812  | 27.2288  |        27.0929         |
|            timm_vovnet            |  32  | 28.9954  |  34.9952  | 26.3411  |        26.9366         |
|            densenet121            |  4   | 55.0512  |  74.5262  | 23.9177  |        57.1692         |
|              hf_Bert              |  4   | 40.6506  |  48.4143  | 22.5053  |        25.5911         |
|         timm_efficientnet         |  32  | 34.5713  |  51.3828  | 22.3305  |         30.384         |
|             resnet50              |  32  | 26.6047  |  34.9749  | 22.0451  |        25.1475         |
|           hf_DistilBert           |  8   | 32.7652  |  35.4534  | 21.4212  |        21.2306         |
|        shufflenet_v2_x1_0         | 128  | 30.4701  |  39.4513  | 18.7237  |        25.5027         |
|      timm_vision_transformer      |  32  | 29.3148  |  32.3824  | 18.2586  |        23.4219         |
|           BERT_pytorch            |  16  | 53.7935  |  67.1628  | 17.6547  |        25.6661         |
|           timm_resnest            |  32  | 24.2529  |  28.4145  | 15.3762  |        16.1882         |
|            mnasnet1_0             |  32  | 23.8128  |  31.5855  | 13.1525  |         21.236         |
|        mobilenet_v3_large         |  32  | 27.2825  |  36.5772  |  13.068  |        23.3505         |
|          resnext50_32x4d          |  8   | 20.7203  |  30.6961  | 11.7575  |        20.9339         |
|          pytorch_stargan          |  16  | 14.9839  |  18.2978  | 11.7266  |        11.8889         |
|      nvidia_deeprecommender       | 256  | 10.2185  |  10.2245  | 11.7053  |        10.0229         |
|         phlippe_densenet          | 128  | 23.4374  |  30.1065  | 11.2281  |        23.2831         |
|              alexnet              | 128  |  9.8282  |  9.8362   |  9.021   |         8.6521         |
|          LearningToPaint          |  96  | 11.4781  |  14.2001  |  8.6598  |        10.7808         |
|            tts_angular            |  64  |  6.8905  |  7.0344   |  6.7668  |         7.3663         |
|             resnet18              |  16  |  9.3353  |  12.973   |  5.7912  |         9.5653         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.5445  |   15.02   |  5.7642  |         7.5777         |
|           squeezenet1_1           |  32  | 11.1972  |  11.6703  |  5.4952  |         7.7982         |
|          phlippe_resnet           | 128  |  9.0354  |  11.9748  |  4.8884  |         8.9623         |
|          pytorch_struct           | 200  |  5.0272  |  6.1212   |  3.2515  |         4.3305         |
|       functorch_dp_cifar10        |  64  | 10.4244  |  11.3129  |  2.8426  |         7.5476         |
|               dlrm                | 1024 |  4.3704  |  5.6928   |  2.1375  |         3.593          |
|                drq                |  1   |  3.4314  |  4.9385   |  2.135   |         3.0329         |
|               dcgan               |  32  |  2.3586  |  3.3891   |  1.5541  |         2.6269         |
|         soft_actor_critic         | 256  |  1.8086  |  2.7743   |  1.343   |         3.2739         |
|           lennard_jones           | 1000 |  1.8358  |  2.1894   |   1.12   |         1.7359         |
|   timm_vision_transformer_large   |  32  | 464.9078 |    nan    |   nan    |        428.4052        |
|           hf_Longformer           |  2   | 113.002  | 163.5781  |   nan    |          nan           |
|               moco                |  32  | 53.8066  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9858 |  0.8954   |  2.4676  |         2.4889         |
|          MobileBertForMaskedLM          | 64  | 0.9577 |  0.8124   |  2.3351  |         1.0788         |
|      GPT2ForSequenceClassification      |  4  | 0.9751 |  0.9506   |  2.2569  |         2.2846         |
|       ElectraForQuestionAnswering       | 64  | 0.9865 |  0.9766   |  2.1181  |          2.09          |
|     MobileBertForQuestionAnswering      | 128 | 0.9456 |   0.837   |  2.1059  |         1.0694         |
|       MT5ForConditionalGeneration       | 16  | 0.9844 |  0.8389   |  2.0826  |         1.8456         |
|             XGLMForCausalLM             |  8  |  0.99  |  0.8576   |  1.8828  |         1.4612         |
|           ElectraForCausalLM            | 32  | 0.9817 |  0.9365   |  1.8427  |         1.8204         |
|            XLNetLMHeadModel             |  8  | 0.9953 |   0.967   |  1.8121  |         1.8259         |
|    LayoutLMForSequenceClassification    | 16  | 0.9843 |  0.9709   |  1.8007  |         1.7738         |
|       RobertaForQuestionAnswering       | 16  | 0.9839 |  0.9698   |  1.7859  |         1.7672         |
|        BertForQuestionAnswering         | 16  | 0.9839 |  0.9696   |  1.7767  |         1.7631         |
|           RobertaForCausalLM            | 16  | 0.9869 |  0.9625   |  1.6777  |         1.6646         |
|     M2M100ForConditionalGeneration      | 16  | 0.9769 |  0.8317   |  1.673   |         1.4594         |
|               DistillGPT2               | 16  | 0.9853 |  0.9552   |  1.6575  |         1.7011         |
|            PLBartForCausalLM            |  8  | 0.9909 |  0.9554   |  1.6515  |         1.6821         |
|       AlbertForQuestionAnswering        |  4  | 0.9998 |  0.8854   |  1.6456  |         1.6413         |
|            AlbertForMaskedLM            |  4  | 0.9998 |  0.8846   |  1.638   |         1.6352         |
|                 T5Small                 |  4  | 0.9789 |  0.8456   |  1.6334  |         1.7244         |
|       T5ForConditionalGeneration        |  4  | 0.9761 |  0.8462   |  1.6282  |         1.727          |
|     PLBartForConditionalGeneration      |  4  | 0.9824 |  0.9461   |  1.6171  |         1.639          |
|    MegatronBertForQuestionAnswering     |  8  | 0.9799 |   0.96    |  1.6029  |         1.6292         |
|             BertForMaskedLM             | 16  | 0.9859 |  0.9605   |  1.5939  |         1.583          |
|           LayoutLMForMaskedLM           | 16  | 0.9854 |  0.9613   |  1.5813  |         1.5934         |
|                CamemBert                | 16  | 0.987  |   0.963   |  1.5463  |         1.5325         |
|         Speech2Text2ForCausalLM         | 256 | 0.9764 |  0.9221   |  1.5281  |         1.5587         |
|             BartForCausalLM             |  4  | 0.9782 |  0.9552   |  1.5257  |         1.5544         |
|            MBartForCausalLM             |  4  | 0.9829 |  0.9409   |  1.5136  |         1.545          |
|            YituTechConvBert             | 16  | 0.9837 |  0.9578   |  1.5106  |         1.4923         |
|         MegatronBertForCausalLM         |  4  | 0.9919 |  0.9062   |  1.4759  |         1.4901         |
|      BartForConditionalGeneration       |  2  | 0.9934 |  0.9245   |  1.4574  |         1.478          |
|     DistilBertForQuestionAnswering      | 256 | 0.9935 |  0.9875   |  1.446   |         1.4518         |
|      MBartForConditionalGeneration      |  2  | 1.0009 |  0.9642   |  1.4447  |         1.4667         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9972 |  0.9104   |  1.3574  |         1.4246         |
|     PegasusForConditionalGeneration     | 32  | 0.9941 |  0.9465   |  1.343   |         1.2902         |
|            TrOCRForCausalLM             | 32  | 0.9873 |  0.9425   |  1.2576  |         1.2855         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9816 |  0.9107   |  1.2323  |         1.2657         |
|          DistilBertForMaskedLM          | 128 | 0.9918 |  0.9509   |  1.2105  |         1.2344         |
|           PegasusForCausalLM            | 32  | 0.9551 |  0.9245   |  1.2019  |         1.2147         |
|       DebertaForQuestionAnswering       |  8  | 0.7994 |  0.7078   |  1.0816  |          0.96          |
|           DebertaForMaskedLM            |  4  | 0.7093 |   0.58    |  0.945   |         0.8095         |
|          DebertaV2ForMaskedLM           |  1  | 0.6813 |  0.5211   |  0.8658  |         0.656          |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6865 |   0.526   |  0.8179  |         0.6877         |
|          BlenderbotForCausalLM          |  4  | 0.9741 |  0.8385   |   0.0    |         1.2692         |
|          AllenaiLongformerBase          |  4  | 0.9986 |  0.6701   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 17.5618 |  40.6989  | 145.0116 |        142.3565        |
|     MobileBertForQuestionAnswering      | 128 | 17.0267 |  39.7284  | 137.7101 |        136.5337        |
|          DebertaV2ForMaskedLM           |  1  | 15.5789 |  27.4405  | 135.7999 |        71.2067         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.3038 |  26.8454  | 135.4218 |        69.8818         |
|     M2M100ForConditionalGeneration      | 16  | 12.0142 |  26.8296  | 134.3798 |        134.984         |
|       MT5ForConditionalGeneration       | 16  | 8.0397  |  18.7851  | 132.6963 |        131.4027        |
|             XGLMForCausalLM             |  8  | 9.6113  |  20.9621  | 131.2673 |        131.4624        |
|            XLNetLMHeadModel             |  8  | 10.3498 |  27.5305  | 92.0228  |        95.2708         |
|           DebertaForMaskedLM            |  4  | 7.5944  |  14.1396  | 82.2466  |         51.425         |
|      MBartForConditionalGeneration      |  2  | 11.669  |  26.538   | 79.4786  |        78.1786         |
|       DebertaForQuestionAnswering       |  8  | 7.2142  |  13.4631  | 79.2893  |        52.6326         |
|      BartForConditionalGeneration       |  2  | 11.686  |  26.1067  | 73.6849  |         74.088         |
|     PegasusForConditionalGeneration     | 32  | 5.4102  |  20.1044  | 68.1564  |        65.8442         |
|            YituTechConvBert             | 16  | 7.5318  |  15.7279  |  67.716  |        67.5576         |
|    MegatronBertForQuestionAnswering     |  8  | 10.3514 |  21.7807  | 67.6987  |        64.3446         |
|         MegatronBertForCausalLM         |  4  | 10.4175 |  21.7045  | 66.2412  |        65.1618         |
| BlenderbotSmallForConditionalGeneration | 64  |  7.798  |  17.0186  | 54.4526  |        53.9374         |
|           ElectraForCausalLM            | 32  | 5.2502  |  10.9356  | 52.0434  |        52.3979         |
|                 T5Small                 |  4  | 5.6395  |  12.7272  |  49.962  |        49.5774         |
|       T5ForConditionalGeneration        |  4  | 5.7164  |  12.7674  | 49.3321  |        50.5206         |
|     PLBartForConditionalGeneration      |  4  | 6.2473  |   13.36   | 48.1929  |        47.6638         |
|    LayoutLMForSequenceClassification    | 16  | 5.4861  |  11.1427  | 46.3367  |        46.4363         |
|       ElectraForQuestionAnswering       | 64  | 5.1949  |  10.7594  | 43.3284  |         44.114         |
|           LayoutLMForMaskedLM           | 16  | 5.5635  |  11.1884  | 41.1159  |        41.8355         |
|            MBartForCausalLM             |  4  | 5.6931  |  11.4472  |  39.965  |        40.1192         |
|            TrOCRForCausalLM             | 32  | 5.6458  |  11.2824  | 38.9154  |        37.1937         |
|             BartForCausalLM             |  4  | 5.7241  |  11.0937  |  38.509  |         38.889         |
|             BertForMaskedLM             | 16  | 5.2628  |  10.8438  | 38.3394  |        37.7635         |
|        BertForQuestionAnswering         | 16  | 5.2224  |  10.8334  | 37.8807  |         37.655         |
|           PegasusForCausalLM            | 32  | 5.6517  |  10.9817  |  37.807  |        36.3805         |
|            AlbertForMaskedLM            |  4  | 2.2558  |  8.2852   | 37.1591  |        37.1675         |
|             OPTForCausalLM              |  2  | 4.7636  |  10.319   | 37.1283  |        36.9696         |
|           RobertaForCausalLM            | 16  | 5.2503  |  10.9568  | 37.0668  |        36.5084         |
|      GPT2ForSequenceClassification      |  4  | 4.8346  |   9.906   | 36.1973  |        35.3577         |
|                CamemBert                | 16  | 5.2891  |  10.7311  | 36.0948  |        36.5625         |
|       RobertaForQuestionAnswering       | 16  | 5.1816  |  10.8963  | 35.6933  |        37.2163         |
|     DistilBertForQuestionAnswering      | 256 | 2.4703  |  5.2624   | 34.8898  |        35.8274         |
|       AlbertForQuestionAnswering        |  4  | 2.3576  |  8.2318   | 33.4146  |        32.7479         |
|          DistilBertForMaskedLM          | 128 | 2.5153  |  5.4134   | 33.3895  |        34.4678         |
|       BlenderbotSmallForCausalLM        | 64  | 3.8514  |  7.5331   | 29.3783  |        28.2434         |
|               DistillGPT2               | 16  | 2.5955  |  5.0978   | 26.9683  |        27.1722         |
|            PLBartForCausalLM            |  8  | 3.0389  |  5.9101   | 25.6586  |        24.6748         |
|         Speech2Text2ForCausalLM         | 256 | 3.0127  |  5.7971   | 25.5576  |        24.9279         |
|          BlenderbotForCausalLM          |  4  | 11.345  |  22.0531  |   nan    |        67.8937         |
|          AllenaiLongformerBase          |  4  | 9.6113  |  31.694   |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.094   |         1.1346         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  1.0402  |         1.0411         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9731  |         0.9739         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9649  |         1.052          |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9281  |         0.9912         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9138  |         0.9886         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9137  |         0.9749         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0018         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.893   |         0.9864         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8836  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.8689  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8184  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.789   |         0.8779         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7651  |         0.9908         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7117  |         0.9792         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9764   |  0.487   |         0.9802         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1526         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8684   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 265.9884 | 300.5892  | 162.5266 |        162.8013        |
|       AlbertForQuestionAnswering        |  4  | 263.8644 | 297.8897  | 160.5654 |        160.9789        |
|            XLNetLMHeadModel             |  8  | 283.5916 | 290.3209  | 153.713  |        151.8861        |
|      DebertaV2ForQuestionAnswering      |  2  | 152.4612 | 197.6528  | 128.9421 |        171.7861        |
|          DebertaV2ForMaskedLM           |  1  | 150.9027 | 197.5357  | 119.8561 |        159.3058        |
|     PegasusForConditionalGeneration     | 32  | 144.9739 | 147.3569  | 112.8832 |        108.3408        |
|            TrOCRForCausalLM             | 32  | 138.9897 | 146.1892  | 109.7359 |        107.2294        |
|      MBartForConditionalGeneration      |  2  | 139.5903 | 143.4637  | 95.2527  |        93.5933         |
|      BartForConditionalGeneration       |  2  | 138.7527 | 156.9635  | 94.3671  |        92.9605         |
|    MegatronBertForQuestionAnswering     |  8  | 144.8361 | 147.3469  | 88.4866  |        87.1584         |
|            YituTechConvBert             | 16  | 127.7213 | 130.4535  | 83.0217  |        83.8775         |
| BlenderbotSmallForConditionalGeneration | 64  | 112.4154 | 124.2463  | 81.0271  |        79.2172         |
|     MobileBertForQuestionAnswering      | 128 | 174.6986 | 219.9309  | 80.7233  |        154.5459        |
|                CamemBert                | 16  | 119.8832 | 122.8157  |  76.445  |        77.3545         |
|     M2M100ForConditionalGeneration      | 16  | 116.7558 | 133.1115  | 75.2948  |        100.4273        |
|            MBartForCausalLM             |  4  | 115.3254 | 121.9269  | 75.0555  |        73.4998         |
|             BartForCausalLM             |  4  | 116.9021 | 118.4369  | 74.8685  |        73.3571         |
|          MobileBertForMaskedLM          | 64  | 179.8391 |  218.374  | 73.0177  |        158.4618        |
|     PLBartForConditionalGeneration      |  4  | 119.9035 | 122.9863  |  72.652  |        71.0537         |
|     DistilBertForQuestionAnswering      | 256 | 103.8742 | 104.7648  | 71.7628  |        71.8157         |
|           LayoutLMForMaskedLM           | 16  |  113.96  | 116.9469  | 71.2312  |        71.3026         |
|       DebertaForQuestionAnswering       |  8  | 94.4888  | 106.7412  | 70.2212  |        79.0087         |
|            PLBartForCausalLM            |  8  | 116.1958 | 117.7688  | 70.0386  |        68.5826         |
|          DistilBertForMaskedLM          | 128 | 85.1856  |  89.0075  | 69.9988  |        68.5631         |
|             BertForMaskedLM             | 16  | 111.403  | 114.3234  | 68.8544  |         69.431         |
|           RobertaForCausalLM            | 16  | 116.5106 | 119.3705  | 68.6727  |        69.0453         |
|             OPTForCausalLM              |  2  | 173.1333 | 181.6894  | 68.6352  |        67.9626         |
|           DebertaForMaskedLM            |  4  | 86.0204  | 121.5295  | 65.2148  |         74.864         |
|       T5ForConditionalGeneration        |  4  | 106.8918 | 123.9577  | 64.3373  |        60.4289         |
|                 T5Small                 |  4  | 106.613  | 123.4918  | 64.2525  |        60.5228         |
|               DistillGPT2               | 16  | 107.7348 | 110.5719  | 63.6754  |        62.2092         |
|         MegatronBertForCausalLM         |  4  | 88.4818  |  95.9715  | 59.3322  |        58.3022         |
|           PegasusForCausalLM            | 32  | 73.1404  |  74.5906  | 58.7537  |        57.1241         |
|             XGLMForCausalLM             |  8  | 90.9746  | 110.0795  |  54.498  |        81.4778         |
|    LayoutLMForSequenceClassification    | 16  | 99.1679  | 100.4163  |  54.234  |        55.1218         |
|       ElectraForQuestionAnswering       | 64  | 116.0566 | 117.1006  | 54.0163  |        54.8688         |
|       RobertaForQuestionAnswering       | 16  | 96.9382  |  98.4722  | 53.4698  |         54.677         |
|        BertForQuestionAnswering         | 16  | 96.7679  |  98.009   | 53.4539  |        53.9187         |
|           ElectraForCausalLM            | 32  | 89.6151  |  93.8857  | 47.7334  |        48.3739         |
|       BlenderbotSmallForCausalLM        | 64  | 59.3393  |  63.8659  | 46.9384  |         45.807         |
|       MT5ForConditionalGeneration       | 16  | 93.7996  | 112.5161  | 44.1017  |         50.55          |
|      GPT2ForSequenceClassification      |  4  | 93.7213  |  96.2339  |  40.53   |        40.0347         |
|         Speech2Text2ForCausalLM         | 256 | 54.2486  |  57.1391  | 35.1634  |        34.4838         |
|          BlenderbotForCausalLM          |  4  | 121.6022 | 129.3434  |   nan    |        91.5828         |
|          AllenaiLongformerBase          |  4  | 180.0597 |  270.644  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9985 |  0.9974   |  3.0097  |         2.9721         |
|      xcit_large_24_p8_224       |  5  | 0.9876 |  0.8676   |  2.0524  |         1.5541         |
|        twins_pcpvt_base         | 64  | 0.9966 |  0.9077   |  1.9567  |         1.6639         |
|         coat_lite_mini          | 128 | 0.9969 |  0.9956   |  1.9419  |         1.9192         |
|          ghostnet_100           | 128 | 0.9925 |  0.7463   |  1.8489  |         1.6122         |
|          gmlp_s16_224           | 128 | 0.9945 |  1.0825   |  1.8387  |         1.8303         |
|          gmixer_24_224          | 128 | 0.9952 |   0.889   |  1.7558  |         1.7482         |
|            lcnet_050            | 128 | 0.941  |  0.7356   |  1.6958  |         1.3932         |
|           volo_d1_224           | 64  | 0.9937 |  0.9731   |  1.6886  |         1.6624         |
|         crossvit_9_240          | 128 | 0.9903 |  0.7828   |  1.6448  |         1.6151         |
|           convit_base           | 64  | 0.9979 |  0.9976   |  1.6128  |         1.6114         |
|  swin_base_patch4_window7_224   | 64  | 0.9908 |  0.9425   |  1.6094  |         1.6035         |
|       gluon_inception_v3        | 128 | 0.9962 |  0.8647   |  1.5316  |         1.5203         |
|          inception_v3           | 128 | 0.9962 |  0.8649   |  1.5302  |         1.5195         |
|        adv_inception_v3         | 128 | 0.9963 |   0.86    |  1.5299  |         1.518          |
|             dla102              | 128 | 0.9956 |  0.8154   |  1.5242  |         1.5222         |
|        sebotnet33ts_256         | 64  | 0.9576 |  0.7649   |  1.5051  |         1.535          |
|          convnext_base          | 64  | 0.9836 |  0.9843   |  1.4896  |          1.47          |
|            nfnet_l0             | 128 | 0.9897 |   0.815   |  1.4862  |         1.431          |
|           dm_nfnet_f0           | 128 | 0.9866 |   0.985   |  1.4786  |         1.4284         |
|       eca_botnext26ts_256       | 128 | 0.9733 |  0.7194   |  1.4445  |         1.4228         |
|           mnasnet_100           | 128 | 0.9476 |  0.7412   |  1.4372  |         1.4958         |
|      mobilenetv3_large_100      | 128 | 0.9494 |  0.7603   |  1.4341  |         1.4282         |
|            pit_b_224            | 64  | 0.9946 |  0.9925   |  1.434   |         1.4288         |
|           resnest101e           | 64  | 0.994  |  0.8657   |  1.4335  |         1.3566         |
|           regnety_002           | 128 | 0.9484 |  0.7086   |  1.4325  |         1.2227         |
|           mobilevit_s           | 64  | 0.962  |  0.7309   |  1.4261  |         1.4397         |
|           selecsls42b           | 128 | 0.9981 |  0.8121   |  1.4103  |         1.4127         |
|          botnet26t_256          | 128 | 0.9727 |  0.8516   |  1.4076  |         1.4238         |
|          cait_m36_384           |  4  | 0.9956 |  0.9422   |  1.3938  |         1.3502         |
|        res2net50_14w_8s         | 128 | 0.9988 |  0.7906   |  1.3804  |         1.3562         |
|           res2next50            | 128 | 0.9988 |  0.8257   |  1.3718  |         1.3642         |
|          jx_nest_base           | 32  | 0.9872 |  0.9856   |  1.365   |         1.3564         |
|         mobilenetv2_100         | 128 | 0.9488 |  0.7366   |  1.3622  |         1.4452         |
|          mixer_b16_224          | 128 | 1.0008 |  1.0183   |  1.3603  |         1.3596         |
|          spnasnet_100           | 128 | 0.941  |  0.7387   |  1.3562  |         1.4161         |
|            hrnet_w18            | 128 | 0.9924 |   0.645   |  1.3547  |         1.3403         |
|        ese_vovnet19b_dw         | 128 | 0.9589 |   0.833   |  1.3521  |         1.3694         |
|       tf_efficientnet_b0        | 128 | 0.9585 |  0.6812   |  1.3519  |         1.3845         |
|           fbnetc_100            | 128 | 0.9492 |  0.7386   |  1.3516  |         1.4055         |
|      beit_base_patch16_224      | 64  | 0.9961 |  0.9661   |  1.3505  |         1.3522         |
|         poolformer_m36          | 64  | 0.9863 |  0.9837   |  1.3255  |         1.3187         |
|            fbnetv3_b            | 128 | 0.9487 |  0.7678   |  1.3088  |         1.3301         |
|           rexnet_100            | 128 | 0.9509 |  0.7031   |  1.2984  |         1.3338         |
|          resmlp_12_224          | 128 | 0.9929 |  0.8897   |  1.2588  |         1.2573         |
| deit_base_distilled_patch16_224 | 64  | 0.9963 |  0.9937   |  1.256   |         1.255          |
|      vit_base_patch16_224       | 64  | 0.9962 |   0.994   |  1.2358  |         1.2354         |
|            tinynet_a            | 128 | 0.9452 |  0.6785   |  1.2284  |         1.2509         |
|          cspdarknet53           | 64  | 0.9321 |  0.7858   |  1.2263  |         1.2585         |
|           tf_mixnet_l           | 128 | 0.9761 |   0.827   |  1.1861  |         1.191          |
|            mixnet_l             | 128 | 0.9762 |  0.8208   |  1.1745  |         1.1811         |
|         visformer_small         | 128 | 0.996  |  0.9449   |  1.1728  |         1.166          |
|        res2net101_26w_4s        | 64  | 1.0015 |  0.7776   |  1.157   |         1.0849         |
|          pnasnet5large          | 16  | 0.9857 |   0.91    |  1.1108  |         1.1285         |
|             dpn107              | 32  | 0.9313 |  0.8069   |  1.0907  |         1.1347         |
|            repvgg_a2            | 128 | 0.935  |  0.7555   |  1.0876  |         1.1185         |
|        gluon_xception65         | 32  | 0.9923 |  0.8426   |  1.0752  |         1.0775         |
|     swsl_resnext101_32x16d      | 32  | 0.9979 |  0.8394   |  1.0565  |          1.02          |
|            gernet_l             | 128 | 0.9351 |  0.7929   |  1.0346  |         1.0683         |
|        convmixer_768_32         | 32  | 0.9984 |  0.9648   |  1.0021  |         1.0027         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.7643  |  11.7023  | 288.633  |        286.6164        |
|            hrnet_w18            | 128 | 9.4761  |  36.0464  | 251.7722 |        251.342         |
|          ghostnet_100           | 128 | 7.5221  |  16.0605  | 237.364  |        240.7118        |
|            fbnetv3_b            | 128 | 8.4826  |  17.074   | 171.7907 |        177.0667        |
|      mobilenetv3_large_100      | 128 | 4.2414  |  8.3691   | 165.1874 |        159.0224        |
|           mobilevit_s           | 64  |  5.346  |  11.2948  | 163.7446 |        152.6373        |
|           resnest101e           | 64  | 11.2886 |  24.4433  | 162.3824 |        167.4093        |
|            tinynet_a            | 128 | 6.1187  |  12.2229  | 162.1294 |        160.1799        |
|          pnasnet5large          | 16  | 8.2532  |  26.3089  | 160.6701 |        162.1834        |
|        adv_inception_v3         | 128 | 5.9797  |  12.5376  | 160.2985 |        162.1355        |
|           tf_mixnet_l           | 128 | 9.1957  |  16.8964  | 159.8192 |        153.3491        |
|       gluon_inception_v3        | 128 | 5.7028  |  12.6915  | 158.312  |        160.1563        |
|            mixnet_l             | 128 | 8.4203  |  16.2476  | 156.5639 |        158.1503        |
|          inception_v3           | 128 |  5.751  |  12.4894  | 156.4453 |        156.8552        |
|        res2net101_26w_4s        | 64  | 10.8203 |  26.1102  | 152.0619 |        152.5181        |
|       tf_efficientnet_b0        | 128 | 5.1814  |  10.4034  | 149.6682 |         153.73         |
|        twins_pcpvt_base         | 64  | 10.7588 |   23.39   | 148.6353 |        148.1482        |
|          spnasnet_100           | 128 | 5.0788  |  9.7006   | 139.1762 |        138.2714        |
|           fbnetc_100            | 128 | 5.0964  |  9.4981   | 138.0168 |        132.3115        |
|      xcit_large_24_p8_224       |  5  | 12.6212 |  28.3868  | 133.6499 |        132.0053        |
|         mobilenetv2_100         | 128 | 4.1938  |  7.8157   | 128.3707 |        131.9352        |
|        res2net50_14w_8s         | 128 | 9.1219  |  22.2312  | 125.2948 |        125.9409        |
|           mnasnet_100           | 128 | 3.9838  |  7.5132   | 120.5102 |        123.9508        |
|          cait_m36_384           |  4  | 14.5439 |  32.8377  | 116.0344 |        114.3019        |
|  swin_base_patch4_window7_224   | 64  | 8.5483  |  19.2274  | 111.5706 |        108.0257        |
|        sebotnet33ts_256         | 64  | 4.2305  |  9.3104   | 108.2703 |        103.3969        |
|           regnety_002           | 128 | 4.9506  |  8.8409   | 105.8303 |        106.607         |
|          cspdarknet53           | 64  | 5.7935  |  10.957   | 103.6594 |        99.8818         |
|             dpn107              | 32  | 9.8023  |  19.5631  | 101.0744 |        99.0996         |
|         poolformer_m36          | 64  | 7.6861  |  13.5996  | 100.7646 |        99.7361         |
|             dla102              | 128 | 6.3089  |  14.1703  | 98.8004  |        96.5514         |
|            lcnet_050            | 128 | 2.5173  |  4.9848   | 98.6416  |        93.2353         |
|       eca_botnext26ts_256       | 128 | 3.1218  |  6.8414   | 97.7052  |        94.0315         |
|        gluon_xception65         | 32  | 7.8072  |  17.0619  | 96.6837  |        93.9875         |
|           selecsls42b           | 128 | 2.5136  |  5.3315   | 93.7408  |        87.8628         |
|           res2next50            | 128 | 5.2278  |  11.9874  | 89.8957  |        88.9255         |
|          botnet26t_256          | 128 | 2.9365  |  6.3904   | 89.6243  |        88.9011         |
|         coat_lite_mini          | 128 | 3.3423  |  7.9426   | 88.3423  |        87.7323         |
|         crossvit_9_240          | 128 | 5.8338  |  13.534   | 86.6095  |        86.3052         |
|          jx_nest_base           | 32  | 6.6987  |  14.6873  | 85.9095  |        82.9234         |
|            gernet_l             | 128 |  4.988  |  8.9342   | 82.2531  |        79.8032         |
|            nfnet_l0             | 128 | 5.3039  |  10.9182  |  79.201  |         74.891         |
|        ese_vovnet19b_dw         | 128 | 2.5667  |  4.8747   |  76.178  |         71.927         |
|           volo_d1_224           | 64  | 5.1451  |  11.8373  | 74.2033  |        75.7054         |
|           dm_nfnet_f0           | 128 | 6.0305  |  11.5069  | 72.6827  |        73.8355         |
|        tnt_s_patch16_224        | 128 | 6.6063  |  16.9826  | 68.3895  |         68.488         |
|         visformer_small         | 128 |  2.688  |  6.0832   | 67.0472  |        67.2827         |
|     swsl_resnext101_32x16d      | 32  | 6.1225  |  13.6836  | 63.1046  |        61.8692         |
|          gmlp_s16_224           | 128 | 5.6457  |  12.0425  | 61.4004  |        58.3879         |
|            repvgg_a2            | 128 | 4.9492  |  8.6577   | 60.6182  |        61.5246         |
|          convnext_base          | 64  |  6.959  |  13.1942  | 58.0114  |        57.8545         |
|          gmixer_24_224          | 128 | 5.7282  |  13.5969  | 52.4712  |        51.6175         |
|           convit_base           | 64  | 3.4831  |  8.6052   | 47.5164  |        47.6582         |
|            pit_b_224            | 64  | 3.5509  |  7.9832   |  44.544  |        45.1733         |
| deit_base_distilled_patch16_224 | 64  | 3.1374  |  7.5462   | 41.8427  |        43.4183         |
|          resmlp_12_224          | 128 | 2.8637  |  5.3289   | 40.4208  |        40.1937         |
|      vit_base_patch16_224       | 64  | 3.0864  |  6.9799   |  39.479  |        38.5487         |
|        convmixer_768_32         | 32  | 1.7119  |   6.851   | 37.5296  |         36.227         |
|      beit_base_patch16_224      | 64  |  4.126  |  8.6917   | 35.0864  |         34.592         |
|          mixer_b16_224          | 128 | 2.7023  |  5.8962   | 32.4718  |        32.5967         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9635 |  0.9151   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.5596 | 310.7898  | 299.3644 |        299.4553        |
|            hrnet_w18            | 128 | 281.221  | 432.4748  | 206.0135 |        209.1943        |
|          pnasnet5large          | 16  | 198.6299 | 214.9299  | 176.6726 |        173.9015        |
|           tf_mixnet_l           | 128 | 193.7459 | 228.8343  | 159.5664 |        158.9551        |
|            mixnet_l             | 128 | 185.1568 | 220.4184  | 154.1387 |        153.1013        |
|          cait_m36_384           |  4  | 173.4835 | 182.1899  | 123.5817 |        127.0192        |
|           resnest101e           | 64  | 165.3075 | 189.6523  | 114.0146 |        120.6602        |
|             dla102              | 128 | 172.5178 | 210.4063  | 112.8003 |        112.8677        |
|     swsl_resnext101_32x16d      | 32  | 118.5748 | 141.1761  | 111.8295 |        116.2978        |
|         poolformer_m36          | 64  | 146.6532 |  146.944  | 109.1818 |        109.688         |
|        tnt_s_patch16_224        | 128 | 323.2736 | 323.5526  | 107.2193 |        108.5363        |
|       gluon_inception_v3        | 128 | 160.8039 | 185.4442  | 104.6556 |        105.3221        |
|        adv_inception_v3         | 128 | 160.6504 | 186.1259  | 104.6062 |        105.4954        |
|          inception_v3           | 128 | 160.7133 | 185.0467  | 104.4863 |        105.3633        |
|        res2net50_14w_8s         | 128 | 140.5428 |  177.466  | 101.7143 |        103.8121        |
|           convit_base           | 64  | 163.3796 |  163.03   | 100.9092 |        101.0225        |
|             dpn107              | 32  | 113.7271 | 131.4008  | 97.2359  |        93.4141         |
|        gluon_xception65         | 32  | 99.8164  | 117.1799  | 91.8746  |        91.7907         |
|           res2next50            | 128 | 126.2961 | 152.5727  | 91.6463  |         92.226         |
|  swin_base_patch4_window7_224   | 64  | 147.2225 | 154.6787  | 90.8429  |        91.0533         |
|          mixer_b16_224          | 128 | 116.7561 | 114.1869  | 86.2743  |         86.241         |
|           dm_nfnet_f0           | 128 | 128.3192 | 128.7619  | 85.6977  |        88.4896         |
|        res2net101_26w_4s        | 64  | 101.0855 | 137.0918  | 85.4463  |        91.2496         |
|            fbnetv3_b            | 128 | 115.3835 | 142.4188  | 83.5615  |        82.2132         |
|            pit_b_224            | 64  | 118.833  | 118.9758  | 82.2759  |        82.6254         |
|          convnext_base          | 64  | 124.552  | 124.2523  | 82.1516  |         83.227         |
|         visformer_small         | 128 | 91.1805  |  96.2136  | 77.4656  |        77.8921         |
|            nfnet_l0             | 128 | 113.3808 | 136.9253  | 75.1805  |        77.9171         |
|      beit_base_patch16_224      | 64  | 101.5371 | 104.5782  | 74.8777  |         74.77          |
|          gmlp_s16_224           | 128 | 137.6432 | 126.2926  | 74.6746  |        74.7167         |
|          jx_nest_base           | 32  | 101.3634 |  101.628  | 73.5614  |        74.0136         |
|       eca_botnext26ts_256       | 128 | 108.7938 | 147.2502  | 73.3863  |        74.3108         |
|          cspdarknet53           | 64  | 94.9261  |  112.74   | 72.3117  |        70.3537         |
|           volo_d1_224           | 64  | 121.156  | 123.5994  | 71.2313  |        72.4948         |
|          botnet26t_256          | 128 | 101.8866 | 116.4606  |  70.466  |        69.6159         |
|            gernet_l             | 128 | 77.6386  |  91.605   |  70.26   |        68.0685         |
|      vit_base_patch16_224       | 64  | 86.9233  |  86.9849  | 70.1146  |        70.0455         |
| deit_base_distilled_patch16_224 | 64  | 84.9212  |  84.982   | 67.3492  |        67.5175         |
|          gmixer_24_224          | 128 | 117.8891 | 132.1409  | 67.1475  |        67.0927         |
|            repvgg_a2            | 128 | 77.6683  |  96.1528  | 66.7638  |         64.978         |
|      xcit_large_24_p8_224       |  5  | 128.0894 | 142.8349  | 62.7904  |        89.2589         |
|       tf_efficientnet_b0        | 128 | 84.8497  | 119.5224  |  60.17   |        58.8096         |
|        twins_pcpvt_base         | 64  | 122.4369 | 127.4257  | 60.1164  |        70.2629         |
|           rexnet_100            | 128 | 80.2141  | 108.3227  | 58.6043  |         57.135         |
|           fbnetc_100            | 128 | 82.8529  |  106.399  | 58.1178  |        55.9245         |
|         coat_lite_mini          | 128 | 112.9385 | 113.0431  | 57.9612  |        58.7341         |
|           mobilevit_s           | 64  | 84.5897  | 111.3457  | 57.0444  |        56.4571         |
|            tinynet_a            | 128 | 73.7274  | 102.5306  | 56.6454  |        55.7262         |
|        sebotnet33ts_256         | 64  | 80.3399  | 100.6634  | 51.1488  |        50.1294         |
|         crossvit_9_240          | 128 | 82.4253  | 104.4192  | 49.7259  |        50.6051         |
|          spnasnet_100           | 128 | 70.3785  |  89.7551  |  48.851  |        46.7945         |
|          ghostnet_100           | 128 | 90.5204  | 120.5848  | 48.5707  |        55.7506         |
|        ese_vovnet19b_dw         | 128 | 64.5183  |  74.3674  | 45.8018  |        45.1994         |
|         mobilenetv2_100         | 128 |  65.525  |  84.385   | 45.6569  |        43.0007         |
|           selecsls42b           | 128 | 60.0913  |  73.7328  |  42.477  |         42.408         |
|           mnasnet_100           | 128 | 64.2604  |  82.121   | 42.3783  |        40.7075         |
|          resmlp_12_224          | 128 | 53.5746  |  59.7147  |  42.166  |         42.251         |
|      mobilenetv3_large_100      | 128 | 61.2801  |  76.4334  | 40.5698  |        40.7644         |
|           regnety_002           | 128 | 39.7586  |  55.0668  |  27.295  |        30.0672         |
|            lcnet_050            | 128 |  31.665  |  40.5361  | 17.5534  |         21.422         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_095_05_04_23_performance_amp_373

Commit hashes

pytorch commit: 1189015
pytorch commit date: 2023-04-06 01:51:10+00:00
torchbench commit: 735f1927996c8d9ab81f0b0c05dd1ebdb26a6250
torchbench commit date: 2023-04-05 09:43:21-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git1189015

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.58x    |    1.57x    |    1.41x    |
| inductor_no_cudagraphs |   1.28x    |    1.49x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.82    |    7.38     |    5.94     |
|       aot_eager        |    9.37    |    15.78    |    13.25    |
|        inductor        |   61.01    |    56.56    |   110.21    |
| inductor_no_cudagraphs |   61.46    |    53.81    |   109.69    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.78x    |    0.88x    |    0.91x    |
| inductor_no_cudagraphs |   0.92x    |    1.01x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Previous report name: /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 85%, 51/60  | 85%, 51/60  |
|        inductor        | huggingface | 91%, 41/45  | 91%, 41/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.60x    |   1.58x   |
|        inductor        | huggingface |   1.58x    |   1.57x   |
|        inductor        | timm_models |   1.41x    |   1.41x   |
| inductor_no_cudagraphs | torchbench  |   1.27x    |   1.28x   |
| inductor_no_cudagraphs | huggingface |   1.50x    |   1.49x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |         hf_Longformer         |   fail_to_run   |      fail_to_run       |
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |        vision_maskrcnn        | eager_variation |    eager_variation     |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |              gat              |     0.0000      |         0.0000         |
| torchbench  |              gcn              |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |             sage              |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |          pass          |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |             dcgan             |  1.4309  |         0.829          |
| torchbench  |         lennard_jones         |  1.3753  |         0.8821         |
| torchbench  |       soft_actor_critic       |  1.1768  |         0.8271         |
| torchbench  |          timm_vovnet          |  0.9389  |         0.9234         |
| torchbench  |    nvidia_deeprecommender     |  0.8725  |         1.0192         |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0816         |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  0.9692  |         0.8205         |
| huggingface |     DebertaV2ForMaskedLM      |  0.8717  |         0.6553         |
| huggingface | DebertaV2ForQuestionAnswering |  0.8283  |         0.6574         |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.3192         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |        phlippe_densenet        | 163.6058 |        165.0784        |
| torchbench  |          hf_T5_large           | 163.2746 |        164.4683        |
| torchbench  |           hf_BigBird           | 147.0968 |        130.6639        |
| torchbench  |       timm_efficientnet        | 144.5686 |        145.1151        |
| torchbench  |       mobilenet_v3_large       | 138.771  |        136.9629        |
| torchbench  |          densenet121           | 137.5498 |        139.797         |
| torchbench  |          mobilenet_v2          | 126.6155 |        128.7796        |
| torchbench  | timm_vision_transformer_large  |   nan    |        124.0618        |
| huggingface |      DebertaV2ForMaskedLM      | 134.7439 |        68.6078         |
| huggingface | DebertaV2ForQuestionAnswering  | 134.4108 |        66.5827         |
| huggingface | MobileBertForQuestionAnswering | 125.8665 |        122.5536        |
| huggingface |     MobileBertForMaskedLM      | 124.1372 |        123.4329        |
| timm_models |           rexnet_100           | 293.9501 |        294.0178        |
| timm_models |           hrnet_w18            | 253.1033 |        253.8215        |
| timm_models |          ghostnet_100          | 237.4655 |        243.9983        |
| timm_models |           fbnetv3_b            | 173.4248 |        175.7158        |
| timm_models |          resnest101e           | 167.7915 |        168.973         |
| timm_models |         pnasnet5large          | 165.913  |        161.7606        |
| timm_models |          mobilevit_s           | 164.2386 |        163.6593        |
| timm_models |           tinynet_a            | 163.4317 |        162.6076        |
| timm_models |          tf_mixnet_l           | 162.7647 |        162.6282        |
| timm_models |          inception_v3          | 161.6484 |        161.2129        |
| timm_models |            mixnet_l            | 159.4858 |        163.3393        |
| timm_models |        adv_inception_v3        | 158.4899 |        158.859         |
| timm_models |       gluon_inception_v3       | 157.3471 |        151.0435        |
| timm_models |     mobilenetv3_large_100      | 157.1535 |        161.9134        |
| timm_models |       tf_efficientnet_b0       | 154.1758 |        157.2405        |
| timm_models |       res2net101_26w_4s        | 153.0158 |        153.9145        |
| timm_models |        twins_pcpvt_base        | 149.3543 |        147.8185        |
| timm_models |           fbnetc_100           | 140.1532 |        139.7912        |
| timm_models |          spnasnet_100          | 135.3095 |        135.479         |
| timm_models |      xcit_large_24_p8_224      | 134.0027 |        133.4242        |
| timm_models |        mobilenetv2_100         | 128.4441 |        127.5431        |
| timm_models |        res2net50_14w_8s        | 126.9403 |        126.5715        |
| timm_models |          mnasnet_100           | 125.4933 |        124.4676        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |                 hf_GPT2                 |  0.8974  |         1.0239         |
| torchbench  |                 hf_Bert                 |  0.893   |         0.9695         |
| torchbench  |                 yolov3                  |  0.8919  |         1.0367         |
| torchbench  |              hf_Bert_large              |  0.8872  |         1.0041         |
| torchbench  |              BERT_pytorch               |  0.8849  |         1.0964         |
| torchbench  |            timm_efficientnet            |  0.8689  |         1.006          |
| torchbench  |              timm_resnest               |  0.8628  |         0.9658         |
| torchbench  |           shufflenet_v2_x1_0            |  0.8613  |         0.9649         |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.8835         |
| torchbench  |              hf_GPT2_large              |  0.8583  |         1.077          |
| torchbench  |               timm_regnet               |  0.8487  |         0.9496         |
| torchbench  |           Background_Matting            |  0.8484  |         1.0406         |
| torchbench  |                resnet152                |  0.8473  |         0.9404         |
| torchbench  |           speech_transformer            |  0.8386  |         0.8406         |
| torchbench  |              pytorch_unet               |  0.8134  |         0.9308         |
| torchbench  |              hf_DistilBert              |  0.8095  |         0.9434         |
| torchbench  |            phlippe_densenet             |  0.8058  |         0.8659         |
| torchbench  |               hf_T5_large               |  0.7824  |         1.0929         |
| torchbench  |                  dcgan                  |  0.7821  |         0.9645         |
| torchbench  |                resnet50                 |  0.7818  |         0.8841         |
| torchbench  |                 demucs                  |  0.773   |         0.9655         |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.8893         |
| torchbench  |              squeezenet1_1              |  0.7701  |         0.9121         |
| torchbench  |               timm_vovnet               |  0.7529  |         0.8869         |
| torchbench  |                 hf_Bart                 |  0.7481  |         0.8605         |
| torchbench  |               mnasnet1_0                |  0.7436  |         0.8061         |
| torchbench  |           mobilenet_v3_large            |  0.7279  |         0.7757         |
| torchbench  |             pytorch_struct              |  0.7277  |         0.7362         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9808         |
| torchbench  |               densenet121               |  0.7096  |         0.8034         |
| torchbench  |                 alexnet                 |  0.7086  |         0.9386         |
| torchbench  |               hf_BigBird                |  0.6932  |         1.1043         |
| torchbench  |             resnext50_32x4d             |  0.6674  |         0.7709         |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.8931         |
| torchbench  |                   drq                   |  0.6379  |         0.9573         |
| torchbench  |            soft_actor_critic            |  0.6066  |         0.9973         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.7463         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.5904  |         0.6004         |
| torchbench  |                resnet18                 |  0.5395  |         0.6097         |
| torchbench  |              lennard_jones              |  0.5317  |         0.9997         |
| torchbench  |               hf_Reformer               |  0.452   |         0.8007         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3169  |         0.3395         |
| huggingface |     PLBartForConditionalGeneration      |  0.8969  |         0.9729         |
| huggingface |    MegatronBertForQuestionAnswering     |  0.889   |         1.0285         |
| huggingface |           PegasusForCausalLM            |  0.8822  |         0.9733         |
| huggingface |            TrOCRForCausalLM             |  0.8721  |         0.9448         |
| huggingface |     PegasusForConditionalGeneration     |   0.87   |         1.0487         |
| huggingface |          DistilBertForMaskedLM          |  0.8683  |         0.9428         |
| huggingface |            PLBartForCausalLM            |  0.8672  |         0.9347         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.863   |         0.9678         |
| huggingface |      MBartForConditionalGeneration      |  0.861   |         1.0219         |
| huggingface |      BartForConditionalGeneration       |  0.8397  |         1.0054         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8123  |         0.9043         |
| huggingface |         MegatronBertForCausalLM         |  0.8068  |         1.0329         |
| huggingface |         Speech2Text2ForCausalLM         |  0.7792  |         0.8658         |
| huggingface |     M2M100ForConditionalGeneration      |  0.7509  |         0.9669         |
| huggingface |          MobileBertForMaskedLM          |  0.7395  |         1.0016         |
| huggingface |             XGLMForCausalLM             |  0.7068  |          0.97          |
| huggingface |     MobileBertForQuestionAnswering      |  0.6534  |         0.8571         |
| huggingface |           DebertaForMaskedLM            |  0.5501  |         0.9978         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5197  |         0.9665         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.487   |         0.9802         |
| huggingface |       DebertaForQuestionAnswering       |  0.4601  |         1.1526         |
| timm_models |                hrnet_w18                |  0.8918  |          0.99          |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1115         |
| timm_models |              inception_v3               |  0.8904  |         1.0171         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0171         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0171         |
| timm_models |                 dpn107                  |  0.8833  |         0.9642         |
| timm_models |            gluon_xception65             |  0.8831  |         0.9705         |
| timm_models |              ghostnet_100               |  0.8807  |         0.977          |
| timm_models |              spnasnet_100               |  0.8786  |         0.9451         |
| timm_models |          mobilenetv3_large_100          |  0.877   |         0.9361         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1871         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0072         |
| timm_models |          xcit_large_24_p8_224           |  0.8721  |         0.9732         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9607         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9483         |
| timm_models |                mixnet_l                 |  0.8687  |         0.9902         |
| timm_models |               mnasnet_100               |  0.8683  |         0.9403         |
| timm_models |               res2next50                |  0.866   |         0.9547         |
| timm_models |              cait_m36_384               |  0.8632  |         0.989          |
| timm_models |               fbnetc_100                |  0.8596  |         0.9535         |
| timm_models |                pit_b_224                |  0.8578  |         1.0242         |
| timm_models |               selecsls42b               |  0.8576  |         0.9664         |
| timm_models |              convnext_base              |  0.8505  |         1.0338         |
| timm_models |                gernet_l                 |  0.8499  |         0.9706         |
| timm_models |         swsl_resnext101_32x16d          |  0.8461  |         0.9786         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0202         |
| timm_models |              botnet26t_256              |  0.8239  |         0.9779         |
| timm_models |                lcnet_050                |  0.805   |         0.884          |
| timm_models |                repvgg_a2                |  0.7738  |         0.9611         |
| timm_models |               regnety_002               |  0.7602  |         0.8966         |
| timm_models |             crossvit_9_240              |  0.7526  |         0.9898         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9045         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9604         |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/comp_time_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/passrate_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Peak Memory Compression Ratio regressions

+------------------------+---------------+-------------+------------+
|        compiler        |     name      | prev_status | cur_status |
+------------------------+---------------+-------------+------------+
|        inductor        |    hf_GPT2    |   0.9321    |   0.8974   |
|        inductor        |    hf_Bert    |   0.9422    |   0.893    |
|        inductor        | hf_Bert_large |   0.9402    |   0.8872   |
|        inductor        | BERT_pytorch  |   0.9428    |   0.8849   |
| inductor_no_cudagraphs |    hf_Bart    |   0.9173    |   0.8605   |
+------------------------+---------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Peak Memory Compression Ratio regressions

+----------+----------------------------------+-------------+------------+
| compiler |               name               | prev_status | cur_status |
+----------+----------------------------------+-------------+------------+
| inductor |  PLBartForConditionalGeneration  |   0.9649    |   0.8969   |
| inductor | MegatronBertForQuestionAnswering |    0.953    |   0.889    |
| inductor |        PLBartForCausalLM         |   0.9138    |   0.8672   |
+----------+----------------------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.961  |  0.9093   |  3.5988  |         1.3504         |
|           BERT_pytorch            |  16  | 0.9895 |   0.804   |  3.2717  |         2.0545         |
|            hf_BigBird             |  2   | 0.9508 |  0.7763   |  2.9743  |         1.797          |
|            densenet121            |  4   | 0.9858 |  0.7154   |  2.7746  |         1.0541         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9643 |  0.9012   |  2.4847  |         1.7565         |
|             hf_Albert             |  8   | 0.9943 |  0.9549   |  2.3737  |         2.2959         |
|            hf_T5_large            |  2   | 0.9747 |  0.8063   |  2.1512  |         1.8939         |
|         phlippe_densenet          | 128  | 0.9812 |  0.7731   |  2.0888  |         1.014          |
|        mobilenet_v3_large         |  32  | 0.9945 |  0.7801   |  2.0774  |         1.1901         |
|               dlrm                | 1024 | 0.932  |  0.8402   |  1.9477  |         1.182          |
|           squeezenet1_1           |  32  |  0.98  |  0.9177   |  1.9461  |         1.3262         |
|          phlippe_resnet           | 128  | 0.9829 |  0.7596   |  1.8329  |         1.0064         |
|               hf_T5               |  8   | 0.9844 |  0.8496   |  1.8206  |         1.8836         |
|              hf_Bart              |  4   | 0.9734 |  0.8298   |  1.7967  |         1.5619         |
|              hf_Bert              |  4   | 0.9959 |  0.8361   |  1.7902  |         1.5674         |
|              hf_GPT2              |  4   | 0.9924 |  0.9564   |  1.7716  |         1.7162         |
|          resnext50_32x4d          |  8   | 0.9817 |  0.7261   |  1.7537  |         0.9887         |
|           hf_GPT2_large           |  4   | 0.9827 |  0.9714   |  1.6649  |         1.7242         |
|            mnasnet1_0             |  32  | 0.9862 |  0.7343   |  1.6586  |         1.0827         |
|        shufflenet_v2_x1_0         | 128  | 0.9954 |  0.7682   |  1.6356  |         1.2222         |
|             resnet18              |  16  | 0.9894 |  0.7662   |  1.5905  |         0.9798         |
|        speech_transformer         |  32  | 0.977  |  0.8289   |  1.5841  |          1.57          |
|           timm_resnest            |  32  | 0.9923 |  0.8476   |  1.573   |         1.5273         |
|           hf_Bert_large           |  4   |  1.0   |   0.862   |  1.5718  |         1.5504         |
|          pytorch_struct           | 200  | 0.904  |  0.7589   |  1.5366  |         1.0954         |
|           mobilenet_v2            |  96  | 0.9969 |  0.7778   |  1.5309  |         1.5077         |
|           fastNLP_Bert            |  6   | 0.9911 |  0.8465   |   1.53   |         1.4913         |
|            timm_nfnet             | 128  | 0.9855 |  0.9844   |  1.5224  |         1.4712         |
|                drq                |  1   | 0.9678 |  0.7426   |  1.5197  |         1.0318         |
|      timm_vision_transformer      |  32  | 0.9868 |  0.8543   |  1.5108  |         1.3712         |
| attention_is_all_you_need_pytorch | 256  | 0.9892 |  0.8313   |  1.4711  |         1.5857         |
|           hf_DistilBert           |  8   | 0.9804 |  0.9533   |  1.4501  |         1.4619         |
|               dcgan               |  32  | 0.8607 |  0.6985   |  1.4309  |         0.829          |
|         timm_efficientnet         |  32  | 0.9389 |  0.6246   |  1.422   |         1.0829         |
|           lennard_jones           | 1000 | 0.8305 |  0.7181   |  1.3753  |         0.8821         |
|          pytorch_stargan          |  16  | 0.982  |  0.8034   |  1.3592  |         1.307          |
|           pytorch_unet            |  1   | 0.9965 |  0.2041   |  1.3574  |         1.3527         |
|          LearningToPaint          |  96  | 0.9865 |  0.7723   |  1.3167  |         1.0512         |
|               vgg16               |  64  | 0.9993 |  0.9985   |  1.2427  |         1.2573         |
|             resnet152             |  32  | 0.9947 |  0.7702   |  1.2325  |         1.0242         |
|            Super_SloMo            |  6   | 0.9971 |  0.1781   |  1.2309  |         1.231          |
|        Background_Matting         |  4   | 0.9985 |  0.1357   |  1.2133  |         1.2082         |
|              yolov3               |  16  | 0.9962 |  0.8069   |  1.2003  |         1.2015         |
|             resnet50              |  32  | 0.9951 |  0.7743   |  1.1919  |         1.0728         |
|         soft_actor_critic         | 256  | 0.8414 |  0.6445   |  1.1768  |         0.8271         |
|            hf_Reformer            |  4   | 0.9862 |   0.967   |  1.1431  |         1.0623         |
|              alexnet              | 128  | 0.9994 |  0.9965   |  1.0904  |         1.1391         |
|              demucs               |  4   | 0.9998 |  1.0011   |  1.0362  |         1.0364         |
|            timm_regnet            |  32  | 0.9174 |  0.7683   |  0.989   |         0.9635         |
|            tts_angular            |  64  | 0.9208 |  0.8839   |  0.954   |         0.9572         |
|            timm_vovnet            |  32  | 0.8443 |  0.7092   |  0.9389  |         0.9234         |
|      nvidia_deeprecommender       | 256  | 0.9983 |  0.9986   |  0.8725  |         1.0192         |
|   timm_vision_transformer_large   |  32  | 0.998  |    0.0    |   0.0    |         1.0816         |
|           hf_Longformer           |  2   |  1.01  |  0.6889   |   0.0    |          0.0           |
|               moco                |  32  | 0.9337 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|         phlippe_densenet          | 128  | 3.2411  |  7.1162   | 163.6058 |        165.0784        |
|            hf_T5_large            |  2   | 27.0389 |  55.9912  | 163.2746 |        164.4683        |
|            hf_BigBird             |  2   | 13.0623 |  37.9385  | 147.0968 |        130.6639        |
|         timm_efficientnet         |  32  | 5.0245  |  10.2333  | 144.5686 |        145.1151        |
|        mobilenet_v3_large         |  32  | 3.5046  |  7.7959   | 138.771  |        136.9629        |
|            densenet121            |  4   | 7.7115  |  18.3543  | 137.5498 |        139.797         |
|           mobilenet_v2            |  96  | 3.1694  |  7.0881   | 126.6155 |        128.7796        |
|              yolov3               |  16  | 5.0679  |  10.9293  | 115.1905 |        118.3601        |
|             resnet152             |  32  | 9.1493  |  20.2415  | 108.0459 |        105.7939        |
|            mnasnet1_0             |  32  |  3.187  |  6.8698   | 107.8779 |        106.2335        |
|           hf_GPT2_large           |  4   | 14.7756 |  30.325   | 102.7742 |        100.7274        |
|           timm_resnest            |  32  | 1.8519  |  3.9538   | 98.6536  |        100.1285        |
|        shufflenet_v2_x1_0         | 128  | 3.4743  |  7.8098   | 83.0666  |        83.9127         |
|            timm_regnet            |  32  |  6.745  |  12.2634  | 72.5859  |        69.7102         |
|            timm_nfnet             | 128  | 5.9598  |  11.0609  | 72.2596  |        71.3364         |
|        Background_Matting         |  4   | 3.0572  |  11.4151  | 69.4843  |        69.2736         |
|             resnet50              |  32  | 3.2851  |  7.1151   | 65.4268  |        65.4554         |
|        speech_transformer         |  32  | 6.1324  |  13.8949  | 63.8897  |        64.2159         |
|            timm_vovnet            |  32  | 3.6393  |  6.4252   | 62.5701  |        62.7922         |
|           hf_Bert_large           |  4   | 10.361  |  21.4895  | 62.0074  |        62.2313         |
|           pytorch_unet            |  1   | 1.5538  |  4.4018   | 59.4601  |        59.1873         |
|       functorch_dp_cifar10        |  64  | 1.2219  |  2.4568   | 58.1729  |        55.0722         |
|           BERT_pytorch            |  16  | 4.9373  |  11.7662  | 56.6457  |        54.9941         |
| attention_is_all_you_need_pytorch | 256  | 4.4238  |  11.1764  | 56.5511  |        57.5704         |
|          resnext50_32x4d          |  8   | 3.2786  |  7.1322   | 54.5285  |        53.1005         |
|      timm_vision_transformer      |  32  | 3.2931  |  7.1787   | 49.8568  |        48.4487         |
|               hf_T5               |  8   | 5.6347  |  12.9571  | 49.1619  |        49.5583         |
|           fastNLP_Bert            |  6   | 5.2303  |  11.4444  | 48.3296  |         47.495         |
|              hf_Bart              |  4   | 6.0121  |  13.8021  | 47.5861  |         47.459         |
|             resnet18              |  16  | 1.3593  |  2.9167   | 45.5504  |        44.4616         |
|          pytorch_stargan          |  16  | 1.2493  |  3.2874   | 45.1049  |        43.7227         |
|          LearningToPaint          |  96  | 1.4125  |   2.942   | 44.6671  |        42.9352         |
|            hf_Reformer            |  4   |  4.158  |  6.0792   | 44.2961  |         38.294         |
|            Super_SloMo            |  6   | 2.7717  |  9.8581   | 41.9883  |        42.4749         |
|             hf_Albert             |  8   | 2.4948  |  8.1327   | 40.5742  |        37.7509         |
|              hf_GPT2              |  4   | 4.7756  |  9.7243   | 40.1846  |        41.3504         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2334  |  2.9596   | 38.3163  |        36.9006         |
|              hf_Bert              |  4   | 5.1037  |  10.7384  | 36.6623  |        36.3607         |
|          phlippe_resnet           | 128  | 1.3634  |  2.8815   | 31.8247  |        31.6367         |
|              demucs               |  4   | 1.4375  |  2.2211   | 30.1062  |        27.8135         |
|           hf_DistilBert           |  8   | 2.3917  |  5.3229   | 29.3334  |        27.7835         |
|           squeezenet1_1           |  32  |  1.074  |  1.7791   |  24.908  |        24.1043         |
|          pytorch_struct           | 200  | 0.7391  |  1.3325   | 20.1506  |        20.0827         |
|              alexnet              | 128  | 0.4851  |  0.7816   | 15.7335  |         14.688         |
|               vgg16               |  64  | 0.6377  |  1.1247   | 15.1869  |        14.5055         |
|      nvidia_deeprecommender       | 256  | 0.4799  |  0.7637   |  9.5842  |         9.8059         |
|                drq                |  1   | 0.6559  |  1.0196   |  8.7471  |          9.81          |
|               dcgan               |  32  |  0.431  |  0.7189   |  7.3639  |         7.7887         |
|               dlrm                | 1024 | 0.3755  |  0.7932   |  7.0788  |         7.0808         |
|         soft_actor_critic         | 256  | 0.4338  |  0.6138   |  6.6339  |         6.9629         |
|           lennard_jones           | 1000 | 0.3969  |  0.6052   |  6.0795  |         5.4769         |
|            tts_angular            |  64  | 0.4429  |  0.5152   |  5.3233  |         5.3872         |
|   timm_vision_transformer_large   |  32  | 9.5039  |    nan    |   nan    |        124.0618        |
|           hf_Longformer           |  2   |  9.521  |  30.6672  |   nan    |          nan           |
|               moco                |  32  | 33.7135 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2557         |
|           mobilenet_v2            |  96  | 0.9863 |  0.7651   |  1.0103  |         1.1022         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9895  |         0.9983         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  0.983   |         1.0834         |
|            timm_nfnet             | 128  | 0.9071 |  0.8748   |  0.969   |         1.0726         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9586  |         1.0692         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  0.9239  |         1.0425         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.8974  |         1.0239         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.893   |         0.9695         |
|              yolov3               |  16  | 0.9925 |  0.8292   |  0.8919  |         1.0367         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.8872  |         1.0041         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.8849  |         1.0964         |
|         timm_efficientnet         |  32  | 0.9874 |  0.7661   |  0.8689  |         1.006          |
|           timm_resnest            |  32  | 0.989  |  0.8972   |  0.8628  |         0.9658         |
|        shufflenet_v2_x1_0         | 128  | 0.954  |   0.839   |  0.8613  |         0.9649         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8583  |         1.077          |
|            timm_regnet            |  32  | 0.9954 |  0.8532   |  0.8487  |         0.9496         |
|        Background_Matting         |  4   | 1.0123 |  0.6489   |  0.8484  |         1.0406         |
|             resnet152             |  32  | 0.9939 |  0.8936   |  0.8473  |         0.9404         |
|        speech_transformer         |  32  | 0.9915 |    0.9    |  0.8386  |         0.8406         |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8095  |         0.9434         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.7824  |         1.0929         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9928 |  0.8614   |  0.7818  |         0.8841         |
|              demucs               |  4   | 0.9661 |  0.9659   |  0.773   |         0.9655         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|           squeezenet1_1           |  32  | 0.9666 |  0.9321   |  0.7701  |         0.9121         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.7481  |         0.8605         |
|            mnasnet1_0             |  32  | 0.9792 |  0.8656   |  0.7436  |         0.8061         |
|        mobilenet_v3_large         |  32  | 0.9792 |  0.8375   |  0.7279  |         0.7757         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7277  |         0.7362         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|            densenet121            |  4   | 0.994  |  0.9823   |  0.7096  |         0.8034         |
|              alexnet              | 128  | 0.9454 |  0.7923   |  0.7086  |         0.9386         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6932  |         1.1043         |
|          resnext50_32x4d          |  8   | 0.9939 |  0.8424   |  0.6674  |         0.7709         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9202 |  0.7131   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8796   |  0.5904  |         0.6004         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.5395  |         0.6097         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.452   |         0.8007         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 0.9968 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.9398 | 215.0954  | 125.5785 |        121.4774        |
|        Background_Matting         |  4   | 126.1456 | 928.3272  | 103.8026 |        104.3512        |
|            hf_T5_large            |  2   | 228.9362 | 277.2451  | 103.3581 |        138.3888        |
|               hf_T5               |  8   | 181.9346 | 211.2337  |  98.88   |        96.3825         |
|            timm_nfnet             | 128  | 120.2121 |  119.978  | 77.1257  |        80.5857         |
|            hf_BigBird             |  2   | 205.3084 | 252.8944  | 74.6585  |        122.6356        |
|            hf_Reformer            |  4   | 82.2665  |  83.7566  | 70.8987  |        76.2217         |
|            Super_SloMo            |  6   | 79.6313  | 447.1238  | 64.4762  |        64.4673         |
|              yolov3               |  16  | 69.0107  |  85.074   | 57.2578  |        57.0813         |
|            timm_regnet            |  32  | 61.2794  |  73.1208  |  56.266  |        58.2204         |
|             resnet152             |  32  | 64.7019  |  81.2304  | 53.4953  |        64.4726         |
|               vgg16               |  64  | 66.4115  |  66.469   | 53.4245  |        52.8942         |
|           hf_Bert_large           |  4   | 82.8672  |  94.9564  | 52.7683  |        58.4965         |
|              demucs               |  4   | 53.6876  |  53.4047  | 51.7454  |        51.6554         |
| attention_is_all_you_need_pytorch | 256  | 55.4851  |  67.7291  | 36.3025  |         36.623         |
|        speech_transformer         |  32  | 73.5733  |  75.1704  | 36.0682  |        36.2382         |
|              hf_Bart              |  4   | 57.6434  |  75.8022  | 34.7837  |        36.4142         |
|           fastNLP_Bert            |  6   | 60.4126  |  61.9528  | 34.3223  |        35.2868         |
|           mobilenet_v2            |  96  | 47.2508  |  60.4355  | 30.7238  |        31.2689         |
|           pytorch_unet            |  1   | 40.0843  | 195.3326  |  29.366  |        29.4961         |
|             hf_Albert             |  8   | 68.6547  |  71.4433  | 29.2508  |        29.7019         |
|              hf_GPT2              |  4   | 49.3598  |  50.4912  | 27.4286  |        28.6786         |
|            timm_vovnet            |  32  |  29.048  |  34.7796  | 26.2149  |        26.8199         |
|              hf_Bert              |  4   | 40.5228  |  48.2073  | 22.6796  |         25.769         |
|             resnet50              |  32  | 26.8089  |  34.3268  | 22.1649  |        24.4056         |
|         timm_efficientnet         |  32  | 33.9015  |  50.7596  | 22.1173  |        29.6263         |
|           hf_DistilBert           |  8   | 32.1118  |  32.9606  | 21.6187  |         21.494         |
|            densenet121            |  4   | 54.2402  |  75.6295  | 19.4812  |        49.7681         |
|        shufflenet_v2_x1_0         | 128  | 30.6677  |  41.7343  | 18.6978  |        24.9465         |
|      timm_vision_transformer      |  32  | 28.2367  |  32.4642  | 18.2245  |        20.1283         |
|           BERT_pytorch            |  16  |  54.623  |  67.6293  | 17.4739  |        25.8288         |
|           timm_resnest            |  32  | 24.3725  |  28.5256  |  15.312  |        15.8542         |
|            mnasnet1_0             |  32  | 22.4166  |  30.3673  | 13.9644  |        20.3929         |
|        mobilenet_v3_large         |  32  | 26.9526  |   34.31   | 12.7181  |        22.6146         |
|          resnext50_32x4d          |  8   | 20.7761  |  33.0491  | 12.6357  |        19.9261         |
|      nvidia_deeprecommender       | 256  | 10.2247  |  10.2334  | 11.7031  |        10.0153         |
|          pytorch_stargan          |  16  | 15.5769  |  18.4456  | 11.6609  |        12.2095         |
|         phlippe_densenet          | 128  | 23.3871  |  29.7723  |  11.426  |        23.0812         |
|              alexnet              | 128  |  9.8671  |  9.8772   |  9.0237  |         8.6438         |
|          LearningToPaint          |  96  | 11.3687  |  14.7445  |  8.553   |        10.7127         |
|            tts_angular            |  64  |  6.6761  |  6.9284   |  6.4654  |         7.9278         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.4445  |  15.6442  |  5.7671  |         8.034          |
|             resnet18              |  16  |  9.2743  |  12.0933  |  5.5791  |         9.102          |
|           squeezenet1_1           |  32  | 11.0297  |  12.8185  |  5.4985  |         7.6148         |
|          phlippe_resnet           | 128  |  9.0197  |  11.7994  |  4.939   |         8.9899         |
|          pytorch_struct           | 200  |  5.1081  |  5.9823   |  3.1474  |         4.2802         |
|       functorch_dp_cifar10        |  64  | 10.3771  |  11.0896  |  2.8224  |         7.4289         |
|                drq                |  1   |  3.317   |  4.4484   |  2.1718  |         3.2561         |
|               dlrm                | 1024 |  4.3721  |  4.9469   |  2.1256  |         3.5726         |
|               dcgan               |  32  |  2.3448  |  2.9962   |  1.4417  |         2.5856         |
|         soft_actor_critic         | 256  |  1.9169  |  2.4669   |  1.3429  |         1.9203         |
|           lennard_jones           | 1000 |  1.797   |  2.1368   |  1.1591  |         1.7469         |
|   timm_vision_transformer_large   |  32  | 465.0611 |    nan    |   nan    |        429.1214        |
|           hf_Longformer           |  2   | 113.1966 | 165.4566  |   nan    |          nan           |
|               moco                |  32  | 53.9823  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9832 |  0.8963   |  2.4431  |         2.4667         |
|          MobileBertForMaskedLM          | 64  | 0.9512 |  0.8075   |  2.409   |         1.078          |
|     MobileBertForQuestionAnswering      | 128 | 0.9532 |  0.8068   |  2.2342  |         1.0695         |
|      GPT2ForSequenceClassification      |  4  | 0.9763 |  0.9498   |  2.2116  |         2.2623         |
|       ElectraForQuestionAnswering       | 64  | 0.987  |  0.9763   |  2.1146  |         2.0915         |
|       MT5ForConditionalGeneration       | 16  | 0.9885 |  0.8392   |  2.0835  |          1.84          |
|     M2M100ForConditionalGeneration      | 16  | 0.985  |  0.7966   |  1.9081  |         1.4506         |
|           ElectraForCausalLM            | 32  | 0.9815 |  0.9414   |  1.8498  |         1.8059         |
|            XLNetLMHeadModel             |  8  | 0.9963 |   0.968   |  1.7819  |         1.7812         |
|    LayoutLMForSequenceClassification    | 16  | 0.9841 |  0.9711   |  1.7663  |         1.7649         |
|       RobertaForQuestionAnswering       | 16  | 0.9836 |  0.9692   |  1.7602  |         1.7432         |
|        BertForQuestionAnswering         | 16  | 0.9856 |  0.9696   |  1.7526  |         1.7483         |
|       AlbertForQuestionAnswering        |  4  | 0.9998 |  0.8855   |  1.6532  |         1.6479         |
|           RobertaForCausalLM            | 16  | 0.9863 |   0.963   |  1.6508  |         1.6484         |
|               DistillGPT2               | 16  | 0.9867 |  0.9545   |  1.6472  |         1.6947         |
|            AlbertForMaskedLM            |  4  | 0.9993 |  0.8848   |  1.6438  |         1.6344         |
|             XGLMForCausalLM             |  8  | 0.9725 |   0.836   |  1.6279  |         1.4981         |
|     PLBartForConditionalGeneration      |  4  | 0.9863 |  0.9461   |  1.6098  |         1.6524         |
|            PLBartForCausalLM            |  8  | 0.989  |  0.9579   |  1.6029  |         1.6364         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9805 |  0.9608   |  1.5871  |         1.6093         |
|             BertForMaskedLM             | 16  | 0.985  |  0.9609   |  1.5786  |         1.5689         |
|                 T5Small                 |  4  | 0.9757 |  0.8459   |  1.5693  |         1.5554         |
|           LayoutLMForMaskedLM           | 16  | 0.9858 |  0.9617   |  1.5654  |         1.579          |
|       T5ForConditionalGeneration        |  4  | 0.9765 |  0.8509   |  1.5639  |         1.5453         |
|                CamemBert                | 16  | 0.9872 |  0.9623   |  1.5307  |         1.5213         |
|            MBartForCausalLM             |  4  | 0.9778 |  0.9532   |  1.5213  |         1.538          |
|             BartForCausalLM             |  4  | 0.9776 |  0.9474   |  1.5189  |         1.5487         |
|            YituTechConvBert             | 16  | 0.9856 |  0.9582   |  1.5056  |         1.4892         |
|         Speech2Text2ForCausalLM         | 256 | 0.9736 |  0.9238   |  1.5025  |         1.5457         |
|         MegatronBertForCausalLM         |  4  | 0.9899 |  0.9082   |  1.4564  |         1.4918         |
|      BartForConditionalGeneration       |  2  | 0.9976 |  0.9632   |  1.4561  |         1.4803         |
|     DistilBertForQuestionAnswering      | 256 | 0.9932 |  0.9873   |  1.4436  |         1.4498         |
|      MBartForConditionalGeneration      |  2  | 0.9988 |  0.8633   |  1.4422  |         1.4688         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9984 |  0.9055   |  1.3603  |         1.3957         |
|            TrOCRForCausalLM             | 32  | 0.987  |  0.9524   |  1.2645  |         1.2905         |
|     PegasusForConditionalGeneration     | 32  | 0.9992 |   0.941   |  1.2399  |         1.3091         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9855 |  0.9078   |  1.2373  |         1.267          |
|          DistilBertForMaskedLM          | 128 | 0.9918 |  0.9507   |  1.2067  |         1.2353         |
|           PegasusForCausalLM            | 32  | 0.9762 |  0.9217   |  1.1779  |         1.2093         |
|       DebertaForQuestionAnswering       |  8  | 0.7947 |  0.6858   |  1.0512  |         0.9528         |
|           DebertaForMaskedLM            |  4  | 0.7401 |  0.5551   |  0.9692  |         0.8205         |
|          DebertaV2ForMaskedLM           |  1  | 0.6855 |  0.5202   |  0.8717  |         0.6553         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6911 |  0.5273   |  0.8283  |         0.6574         |
|          BlenderbotForCausalLM          |  4  | 0.9701 |  0.8401   |   0.0    |         1.3192         |
|          AllenaiLongformerBase          |  4  | 1.002  |   0.671   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          DebertaV2ForMaskedLM           |  1  | 15.5287 |  27.5645  | 134.7439 |        68.6078         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.1851 |  27.0595  | 134.4108 |        66.5827         |
|     MobileBertForQuestionAnswering      | 128 | 17.8119 |  39.7179  | 125.8665 |        122.5536        |
|          MobileBertForMaskedLM          | 64  | 17.7589 |  40.1792  | 124.1372 |        123.4329        |
|     M2M100ForConditionalGeneration      | 16  | 12.1521 |  26.9659  | 103.482  |        101.3138        |
|             XGLMForCausalLM             |  8  | 9.5706  |  20.4474  | 98.0488  |        99.7513         |
|       MT5ForConditionalGeneration       | 16  | 8.4828  |  18.4848  | 91.4279  |        90.6149         |
|            XLNetLMHeadModel             |  8  | 10.7739 |  27.2896  | 91.1333  |        90.0856         |
|       DebertaForQuestionAnswering       |  8  | 7.4021  |  13.549   | 85.4982  |        52.9641         |
|           DebertaForMaskedLM            |  4  | 7.3356  |  13.7801  | 82.5075  |        49.6617         |
|      MBartForConditionalGeneration      |  2  | 11.924  |   26.43   | 77.4153  |        75.9125         |
|      BartForConditionalGeneration       |  2  | 11.6967 |  26.0831  | 72.4278  |        72.4316         |
|            YituTechConvBert             | 16  | 7.6104  |  15.6281  | 66.6888  |        64.8015         |
|     PegasusForConditionalGeneration     | 32  | 5.4713  |  19.3333  | 66.1277  |        64.4293         |
|         MegatronBertForCausalLM         |  4  | 10.6739 |  21.3494  | 65.2266  |        63.8686         |
|    MegatronBertForQuestionAnswering     |  8  | 10.8684 |  21.3217  |  62.519  |        63.1097         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.6745  |  17.4706  | 52.8173  |        52.5144         |
|       T5ForConditionalGeneration        |  4  | 5.5961  |  12.6176  | 49.2828  |        48.7973         |
|                 T5Small                 |  4  |  5.866  |  12.7523  | 48.7599  |        48.4292         |
|     PLBartForConditionalGeneration      |  4  | 6.2861  |  13.4016  | 46.9402  |        46.5244         |
|           ElectraForCausalLM            | 32  | 5.4499  |  11.5299  | 46.8294  |        46.4639         |
|    LayoutLMForSequenceClassification    | 16  | 5.8383  |  11.208   | 43.1637  |        43.1835         |
|            MBartForCausalLM             |  4  | 5.9406  |  11.2535  | 39.2372  |        38.6095         |
|       ElectraForQuestionAnswering       | 64  | 5.5027  |  10.943   |  39.099  |        38.9507         |
|           LayoutLMForMaskedLM           | 16  | 5.8681  |  11.435   | 37.0698  |        38.0384         |
|             BartForCausalLM             |  4  | 5.6724  |  11.4223  | 37.0451  |        37.8281         |
|             BertForMaskedLM             | 16  | 5.5302  |  10.9746  | 36.7595  |        38.2094         |
|             OPTForCausalLM              |  2  | 4.8088  |  10.1616  | 36.6339  |        36.4115         |
|            TrOCRForCausalLM             | 32  | 5.8689  |  11.005   | 36.5424  |        36.2757         |
|           PegasusForCausalLM            | 32  | 5.7826  |  10.9414  | 36.3104  |        36.1614         |
|            AlbertForMaskedLM            |  4  | 2.2624  |  8.2074   | 36.2903  |        36.0106         |
|                CamemBert                | 16  | 5.4925  |  10.8539  | 36.1414  |        34.4872         |
|        BertForQuestionAnswering         | 16  | 5.4726  |  10.772   | 35.7033  |        39.0421         |
|      GPT2ForSequenceClassification      |  4  | 5.0083  |  9.9427   | 35.1951  |        35.0897         |
|           RobertaForCausalLM            | 16  | 5.5301  |  10.7565  | 35.1668  |         36.713         |
|       RobertaForQuestionAnswering       | 16  | 5.5189  |  10.6746  | 33.4603  |        34.6945         |
|       AlbertForQuestionAnswering        |  4  | 2.3604  |   8.091   | 33.2637  |        32.3068         |
|          DistilBertForMaskedLM          | 128 | 2.6679  |  5.3902   | 32.6619  |         32.738         |
|     DistilBertForQuestionAnswering      | 256 | 2.6438  |  5.2718   | 32.3317  |        34.2623         |
|       BlenderbotSmallForCausalLM        | 64  | 3.8111  |  7.5787   | 27.7328  |        27.3221         |
|               DistillGPT2               | 16  | 2.5598  |  5.1936   | 27.7086  |        28.4858         |
|         Speech2Text2ForCausalLM         | 256 | 3.1163  |  5.6498   | 26.4496  |        25.5763         |
|            PLBartForCausalLM            |  8  | 3.2191  |  5.8863   | 26.2415  |        25.7921         |
|          BlenderbotForCausalLM          |  4  | 10.9955 |  21.9649  |   nan    |        66.8861         |
|          AllenaiLongformerBase          |  4  | 9.6667  |  31.4188  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.0886  |         1.1285         |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.0331  |         1.0331         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.0283  |         1.1266         |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.0254  |         1.0717         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  0.9987  |         1.0823         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  0.9954  |         0.996          |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  0.9926  |         1.0654         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  0.9888  |         1.0658         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  0.9854  |         0.9841         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  0.9852  |         0.984          |
|                CamemBert                | 16  |  1.0   |  0.9184   |  0.9837  |         0.9825         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  0.9694  |         1.0356         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  0.9595  |         1.0672         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  0.9595  |         1.0672         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  0.945   |         0.9842         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9376  |         1.0274         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9319  |         0.9325         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9206  |         0.9827         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9064  |         0.9666         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9005  |         0.9912         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.8969  |         0.9729         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.889   |         1.0285         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.8822  |         0.9733         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8721  |         0.9448         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |   0.87   |         1.0487         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8683  |         0.9428         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.8672  |         0.9347         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.863   |         0.9678         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.861   |         1.0219         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8397  |         1.0054         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8123  |         0.9043         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.8068  |         1.0329         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.7792  |         0.8658         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7509  |         0.9669         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7395  |         1.0016         |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7068  |          0.97          |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6534  |         0.8571         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9763   |  0.487   |         0.9802         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1526         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.9988         |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8684   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.4145 | 300.8189  | 161.8226 |        163.2248        |
|            XLNetLMHeadModel             |  8  | 285.0207 | 292.7654  | 159.7074 |        159.2576        |
|       AlbertForQuestionAnswering        |  4  | 264.1338 | 298.0944  | 159.5994 |        160.3168        |
|      DebertaV2ForQuestionAnswering      |  2  | 154.4745 | 199.6087  | 128.5345 |        158.5916        |
|          DebertaV2ForMaskedLM           |  1  | 152.5208 | 198.1116  | 119.2003 |        155.005         |
|     PegasusForConditionalGeneration     | 32  | 144.015  | 150.0305  | 111.0888 |        115.0028        |
|            TrOCRForCausalLM             | 32  | 139.1892 | 144.1049  | 109.8622 |        106.7559        |
|      MBartForConditionalGeneration      |  2  | 140.2428 | 172.3382  | 95.1451  |        93.6548         |
|      BartForConditionalGeneration       |  2  | 141.2894 | 143.6048  | 94.4125  |        92.8611         |
|    MegatronBertForQuestionAnswering     |  8  | 145.3645 | 147.6542  | 89.4785  |        88.1745         |
|            YituTechConvBert             | 16  | 128.1361 |  130.678  | 83.4903  |        84.2697         |
|     MobileBertForQuestionAnswering      | 128 | 199.2368 | 209.2058  | 82.9997  |        160.4337        |
| BlenderbotSmallForConditionalGeneration | 64  | 114.197  | 124.4111  | 80.9124  |        83.1189         |
|                CamemBert                | 16  | 120.2286 | 122.9246  | 77.3222  |        77.8891         |
|     M2M100ForConditionalGeneration      | 16  | 152.0239 | 140.7461  | 75.7915  |        87.2089         |
|            MBartForCausalLM             |  4  | 116.2742 | 118.7342  | 75.3844  |        74.3531         |
|             BartForCausalLM             |  4  | 117.3551 | 120.5843  | 75.0546  |        73.1271         |
|          MobileBertForMaskedLM          | 64  | 201.3487 | 209.4733  | 73.0746  |        163.0973        |
|     PLBartForConditionalGeneration      |  4  | 119.2889 | 122.8101  | 72.5503  |        71.2548         |
|       DebertaForQuestionAnswering       |  8  | 95.2017  | 110.7031  | 72.2718  |        79.5841         |
|           LayoutLMForMaskedLM           | 16  | 114.366  | 117.1219  | 71.9471  |        71.3248         |
|     DistilBertForQuestionAnswering      | 256 | 104.178  | 104.6766  | 71.8066  |         71.722         |
|          DistilBertForMaskedLM          | 128 | 85.4575  |   89.15   | 70.2069  |        69.0405         |
|            PLBartForCausalLM            |  8  | 116.4602 | 120.0784  | 70.1405  |        68.7323         |
|           RobertaForCausalLM            | 16  | 116.9235 | 119.3703  | 69.7308  |        69.8751         |
|             BertForMaskedLM             | 16  | 111.931  | 114.4077  | 69.7125  |        70.1411         |
|             OPTForCausalLM              |  2  | 171.8837 | 184.5547  | 68.6927  |        67.9593         |
|       T5ForConditionalGeneration        |  4  | 106.8998 | 122.8825  | 66.8325  |        67.4846         |
|                 T5Small                 |  4  | 107.3823 | 123.5154  | 66.7647  |        67.4535         |
|           DebertaForMaskedLM            |  4  | 94.1344  | 108.3977  | 66.0803  |        74.9978         |
|               DistillGPT2               | 16  | 107.2044 | 110.7483  | 64.2322  |        62.8542         |
|         MegatronBertForCausalLM         |  4  | 88.6597  |  94.7247  | 60.0195  |        58.9399         |
|           PegasusForCausalLM            | 32  | 76.5525  |  75.1395  | 58.8528  |        57.3575         |
|    LayoutLMForSequenceClassification    | 16  | 99.5387  | 100.6633  | 55.3678  |         55.322         |
|       ElectraForQuestionAnswering       | 64  | 118.2392 | 117.4429  |  54.951  |        55.7425         |
|             XGLMForCausalLM             |  8  | 114.8803 | 138.6421  | 54.3533  |        75.6517         |
|        BertForQuestionAnswering         | 16  | 98.1366  |  98.1072  | 54.3212  |        55.2023         |
|       RobertaForQuestionAnswering       | 16  | 97.5172  |  98.5263  | 54.2841  |         54.971         |
|           ElectraForCausalLM            | 32  | 89.7985  |  95.4206  | 48.3753  |        48.7551         |
|       BlenderbotSmallForCausalLM        | 64  |  59.89   |  64.0502  | 46.8858  |        45.7286         |
|       MT5ForConditionalGeneration       | 16  | 103.4955 | 111.0138  | 44.0692  |        50.1744         |
|      GPT2ForSequenceClassification      |  4  | 94.3206  |  96.3695  | 41.4065  |        41.0437         |
|         Speech2Text2ForCausalLM         | 256 |  55.434  |  56.5128  | 35.1292  |        34.2746         |
|          BlenderbotForCausalLM          |  4  | 111.2747 | 127.7744  |   nan    |        82.6869         |
|          AllenaiLongformerBase          |  4  | 179.9739 | 271.7011  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9987 |   0.997   |  3.0209  |         2.9808         |
|      xcit_large_24_p8_224       |  5  | 0.9895 |  0.8751   |  1.9892  |         1.5627         |
|        twins_pcpvt_base         | 64  | 0.9949 |  0.9081   |  1.9508  |         1.6688         |
|         coat_lite_mini          | 128 | 0.997  |  0.9952   |  1.9403  |         1.9203         |
|          ghostnet_100           | 128 | 0.9921 |  0.7646   |  1.8507  |         1.6353         |
|          gmlp_s16_224           | 128 | 0.9949 |  1.0832   |  1.8485  |         1.8399         |
|          gmixer_24_224          | 128 | 0.9951 |  0.8896   |  1.7641  |         1.7523         |
|           volo_d1_224           | 64  | 0.9943 |  0.9727   |  1.6919  |         1.6686         |
|            lcnet_050            | 128 | 0.9411 |  0.7338   |  1.6836  |         1.4408         |
|         crossvit_9_240          | 128 | 0.9902 |  0.7827   |  1.6454  |         1.6229         |
|  swin_base_patch4_window7_224   | 64  | 0.9908 |   0.954   |  1.6186  |         1.6078         |
|           convit_base           | 64  | 0.9982 |  0.9977   |  1.6128  |         1.6112         |
|       gluon_inception_v3        | 128 | 0.996  |  0.8646   |  1.5337  |         1.5235         |
|        adv_inception_v3         | 128 | 0.9963 |  0.8603   |  1.5331  |         1.5214         |
|          inception_v3           | 128 | 0.9962 |  0.8644   |  1.5321  |         1.5227         |
|             dla102              | 128 | 0.9954 |  0.8149   |  1.5294  |         1.5233         |
|        sebotnet33ts_256         | 64  | 0.9574 |  0.7645   |  1.5077  |         1.4844         |
|          convnext_base          | 64  | 0.983  |  0.9851   |  1.4908  |         1.4725         |
|            nfnet_l0             | 128 | 0.9901 |  0.8135   |  1.4881  |         1.4347         |
|           dm_nfnet_f0           | 128 | 0.9865 |  0.9852   |  1.4768  |         1.4282         |
|       eca_botnext26ts_256       | 128 | 0.9735 |  0.7186   |  1.4465  |         1.4253         |
|           mnasnet_100           | 128 | 0.9466 |  0.7404   |  1.4355  |         1.4958         |
|            pit_b_224            | 64  | 0.9946 |  0.9923   |  1.4351  |         1.4291         |
|           mobilevit_s           | 64  | 0.9615 |  0.7304   |  1.4321  |         1.4423         |
|      mobilenetv3_large_100      | 128 | 0.9488 |  0.7595   |  1.4316  |         1.4385         |
|           resnest101e           | 64  | 0.9939 |   0.866   |  1.4271  |         1.3558         |
|           selecsls42b           | 128 | 0.9984 |  0.8116   |  1.4142  |         1.4121         |
|           regnety_002           | 128 | 0.952  |  0.7139   |  1.4106  |         1.2337         |
|          botnet26t_256          | 128 | 0.9718 |  0.8508   |  1.4056  |         1.4251         |
|         mobilenetv2_100         | 128 | 0.9475 |  0.7373   |  1.3878  |         1.444          |
|        res2net50_14w_8s         | 128 | 0.9988 |  0.7904   |  1.3821  |         1.361          |
|          cait_m36_384           |  4  | 0.9946 |  0.9934   |  1.374   |         1.3464         |
|           res2next50            | 128 | 0.9986 |  0.8253   |  1.3721  |         1.3633         |
|          jx_nest_base           | 32  | 0.9871 |  0.9854   |  1.3672  |         1.3569         |
|          mixer_b16_224          | 128 | 0.9972 |  1.0183   |  1.3667  |         1.3632         |
|          spnasnet_100           | 128 | 0.9407 |  0.7377   |  1.3599  |         1.4176         |
|            hrnet_w18            | 128 | 0.9918 |  0.6331   |  1.3577  |         1.3573         |
|       tf_efficientnet_b0        | 128 | 0.9594 |  0.6804   |  1.356   |         1.3855         |
|        ese_vovnet19b_dw         | 128 | 0.9577 |  0.8318   |  1.3529  |         1.3745         |
|      beit_base_patch16_224      | 64  | 0.9965 |  0.9653   |  1.3515  |         1.3528         |
|           fbnetc_100            | 128 | 0.9494 |  0.7374   |  1.3499  |         1.4075         |
|         poolformer_m36          | 64  | 0.9867 |  0.9831   |  1.3279  |         1.319          |
|            fbnetv3_b            | 128 | 0.9487 |  0.7677   |  1.314   |         1.3282         |
|           rexnet_100            | 128 | 0.9518 |  0.7022   |  1.3016  |         1.3348         |
|          resmlp_12_224          | 128 | 0.9929 |  0.8898   |  1.2618  |         1.2595         |
| deit_base_distilled_patch16_224 | 64  | 0.9962 |  0.9931   |  1.255   |         1.2556         |
|      vit_base_patch16_224       | 64  | 0.9962 |  0.9936   |  1.235   |         1.236          |
|            tinynet_a            | 128 | 0.9464 |  0.6781   |  1.2337  |         1.2611         |
|          cspdarknet53           | 64  | 0.9314 |  0.7841   |  1.2325  |         1.2616         |
|           tf_mixnet_l           | 128 | 0.9762 |  0.8268   |  1.1869  |         1.1925         |
|            mixnet_l             | 128 | 0.976  |  0.8208   |  1.1772  |         1.1825         |
|         visformer_small         | 128 | 0.9956 |  0.9446   |  1.1741  |         1.1672         |
|        res2net101_26w_4s        | 64  | 0.9986 |  0.7981   |  1.1493  |         1.0981         |
|          pnasnet5large          | 16  | 0.9856 |  0.9181   |  1.1145  |         1.1302         |
|             dpn107              | 32  | 0.9319 |  0.8063   |  1.0898  |         1.1349         |
|            repvgg_a2            | 128 | 0.9351 |  0.7552   |  1.0869  |         1.1206         |
|        gluon_xception65         | 32  | 0.9923 |  0.8424   |  1.0761  |         1.0795         |
|     swsl_resnext101_32x16d      | 32  | 0.9976 |  0.8409   |  1.0605  |         1.0211         |
|            gernet_l             | 128 | 0.933  |  0.7927   |  1.0379  |         1.0675         |
|        convmixer_768_32         | 32  | 0.9985 |  0.9637   |  1.0018  |         1.0031         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 |  5.748  |  11.3838  | 293.9501 |        294.0178        |
|            hrnet_w18            | 128 | 9.7774  |  36.6494  | 253.1033 |        253.8215        |
|          ghostnet_100           | 128 | 7.5971  |  15.4086  | 237.4655 |        243.9983        |
|            fbnetv3_b            | 128 | 8.6043  |  17.3189  | 173.4248 |        175.7158        |
|           resnest101e           | 64  | 11.2256 |  24.7236  | 167.7915 |        168.973         |
|          pnasnet5large          | 16  | 8.3343  |  26.372   | 165.913  |        161.7606        |
|           mobilevit_s           | 64  | 5.3759  |  11.5765  | 164.2386 |        163.6593        |
|            tinynet_a            | 128 | 6.0083  |  12.4094  | 163.4317 |        162.6076        |
|           tf_mixnet_l           | 128 | 9.1983  |  17.097   | 162.7647 |        162.6282        |
|          inception_v3           | 128 |  5.705  |  12.8468  | 161.6484 |        161.2129        |
|            mixnet_l             | 128 | 8.3467  |  16.1499  | 159.4858 |        163.3393        |
|        adv_inception_v3         | 128 | 5.6807  |  12.5137  | 158.4899 |        158.859         |
|       gluon_inception_v3        | 128 | 5.7583  |  12.8286  | 157.3471 |        151.0435        |
|      mobilenetv3_large_100      | 128 | 4.2637  |   8.624   | 157.1535 |        161.9134        |
|       tf_efficientnet_b0        | 128 | 5.1263  |  10.579   | 154.1758 |        157.2405        |
|        res2net101_26w_4s        | 64  | 10.8301 |  25.0204  | 153.0158 |        153.9145        |
|        twins_pcpvt_base         | 64  | 10.6959 |  23.6115  | 149.3543 |        147.8185        |
|           fbnetc_100            | 128 | 4.9021  |  9.6521   | 140.1532 |        139.7912        |
|          spnasnet_100           | 128 |  4.99   |  9.4168   | 135.3095 |        135.479         |
|      xcit_large_24_p8_224       |  5  | 12.5898 |  28.726   | 134.0027 |        133.4242        |
|         mobilenetv2_100         | 128 | 4.0587  |  8.0222   | 128.4441 |        127.5431        |
|        res2net50_14w_8s         | 128 | 9.0408  |  22.5401  | 126.9403 |        126.5715        |
|           mnasnet_100           | 128 |  4.007  |  7.7178   | 125.4933 |        124.4676        |
|          cait_m36_384           |  4  | 13.7829 |  30.8942  | 117.1672 |        115.8541        |
|        sebotnet33ts_256         | 64  | 4.1902  |  9.0472   | 109.7324 |        108.5305        |
|  swin_base_patch4_window7_224   | 64  | 8.3558  |  19.4758  | 107.2849 |        105.6234        |
|           regnety_002           | 128 | 4.9206  |  8.8399   | 106.6293 |        105.2272        |
|          cspdarknet53           | 64  | 5.9269  |  11.0006  | 102.0418 |        100.978         |
|         poolformer_m36          | 64  | 7.6753  |  13.807   | 101.6298 |        100.6239        |
|             dpn107              | 32  | 10.146  |  19.9437  |  98.886  |        99.2122         |
|       eca_botnext26ts_256       | 128 |  3.204  |  6.9441   | 98.1235  |        97.0236         |
|             dla102              | 128 | 6.3083  |  14.1555  | 97.7816  |        95.1929         |
|            lcnet_050            | 128 | 2.5154  |  5.0366   | 95.7179  |        97.5734         |
|        gluon_xception65         | 32  | 7.9884  |  16.9938  | 95.4612  |        93.5504         |
|           selecsls42b           | 128 | 2.4974  |  5.3517   |  89.749  |        87.3586         |
|         coat_lite_mini          | 128 | 3.3004  |  7.8711   | 89.7453  |        88.3454         |
|          botnet26t_256          | 128 | 3.0121  |   5.978   | 89.4538  |        89.9171         |
|         crossvit_9_240          | 128 | 5.9417  |  13.4464  | 88.1022  |         86.766         |
|           res2next50            | 128 | 5.1125  |  12.0458  | 88.1013  |        86.1485         |
|          jx_nest_base           | 32  | 6.6216  |  14.643   | 82.7327  |        82.2241         |
|            gernet_l             | 128 | 5.0449  |  8.9443   | 82.0247  |        82.3468         |
|        ese_vovnet19b_dw         | 128 | 2.5678  |  4.7109   | 77.4556  |        75.2341         |
|            nfnet_l0             | 128 | 5.3607  |  11.0146  | 77.1714  |        76.3971         |
|           dm_nfnet_f0           | 128 | 6.2767  |  11.6193  |  73.14   |        70.7674         |
|           volo_d1_224           | 64  | 5.0823  |  11.9365  | 72.7288  |        72.7436         |
|        tnt_s_patch16_224        | 128 | 6.5476  |  16.4583  | 68.7833  |         68.179         |
|         visformer_small         | 128 | 2.6639  |  6.1901   | 67.2011  |        67.0923         |
|     swsl_resnext101_32x16d      | 32  | 6.2185  |  13.6274  | 62.7895  |        61.4915         |
|          gmlp_s16_224           | 128 | 5.6485  |  12.0894  | 60.9534  |        57.4999         |
|            repvgg_a2            | 128 | 4.8665  |  8.8567   | 59.3389  |        60.7472         |
|          convnext_base          | 64  | 6.6778  |  12.5607  | 58.9668  |        57.9169         |
|          gmixer_24_224          | 128 | 5.8501  |  13.102   | 49.4458  |        50.1397         |
|           convit_base           | 64  | 3.4775  |  8.7077   | 49.4429  |        47.1987         |
|            pit_b_224            | 64  | 3.5409  |  7.9661   | 44.7677  |        44.4376         |
| deit_base_distilled_patch16_224 | 64  | 3.1365  |  7.1956   | 42.3374  |        40.3243         |
|          resmlp_12_224          | 128 | 2.8139  |  5.3477   | 39.2666  |        39.2514         |
|      vit_base_patch16_224       | 64  |  3.076  |  7.1193   | 38.5511  |        37.6403         |
|        convmixer_768_32         | 32  | 1.6764  |  7.0276   | 38.5352  |        35.2153         |
|      beit_base_patch16_224      | 64  | 3.9297  |  8.7368   | 36.4928  |        34.9686         |
|          mixer_b16_224          | 128 | 2.6828  |  6.0742   | 31.5971  |         31.578         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9635 |  0.9151   |  0.9536  |         1.0325         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.6469 | 311.6188  | 299.9559 |        299.2801        |
|            hrnet_w18            | 128 | 283.5961 | 443.9543  | 205.9462 |        206.9336        |
|          pnasnet5large          | 16  | 199.0553 | 213.9295  | 176.5521 |        173.8213        |
|           tf_mixnet_l           | 128 | 194.2429 | 229.0922  | 159.5747 |        158.9584        |
|            mixnet_l             | 128 | 185.7304 | 220.5927  | 153.9548 |        153.2683        |
|          cait_m36_384           |  4  | 168.5805 | 168.0289  | 124.2582 |        128.6942        |
|           resnest101e           | 64  | 165.7014 | 189.8583  | 115.0595 |        121.3512        |
|             dla102              | 128 | 172.9912 |  211.449  | 112.747  |        113.1192        |
|     swsl_resnext101_32x16d      | 32  | 118.7967 | 141.1787  | 111.8258 |        116.3507        |
|         poolformer_m36          | 64  | 146.9978 | 147.5591  | 109.1846 |        109.9208        |
|        tnt_s_patch16_224        | 128 | 324.6532 | 325.6008  | 107.3358 |        108.7041        |
|        adv_inception_v3         | 128 | 161.1308 | 186.3611  | 104.8018 |        105.5346        |
|       gluon_inception_v3        | 128 | 161.3251 | 185.9536  | 104.7782 |        105.4352        |
|          inception_v3           | 128 | 160.9635 | 185.5339  | 104.7033 |        105.3376        |
|        res2net50_14w_8s         | 128 | 140.8843 | 178.2768  | 102.1208 |        103.2795        |
|           convit_base           | 64  | 163.3293 | 163.4419  | 101.0964 |        101.1379        |
|             dpn107              | 32  | 114.1906 | 131.8678  | 97.4512  |        93.8223         |
|        gluon_xception65         | 32  | 99.9984  | 117.5295  | 92.2892  |        91.8948         |
|           res2next50            | 128 | 126.4512 | 152.9562  | 91.9247  |        92.5874         |
|  swin_base_patch4_window7_224   | 64  | 147.7336 | 153.2997  | 90.3012  |        91.0341         |
|           dm_nfnet_f0           | 128 | 129.0615 | 128.9313  | 85.9498  |         88.748         |
|          mixer_b16_224          | 128 | 117.0138 |  114.613  | 85.3757  |        85.8406         |
|        res2net101_26w_4s        | 64  | 100.3673 | 125.5862  | 85.0821  |         89.75          |
|            fbnetv3_b            | 128 | 115.6034 | 143.2428  | 83.5816  |        82.6076         |
|            pit_b_224            | 64  | 118.9097 |  119.188  | 82.3962  |         82.697         |
|          convnext_base          | 64  | 124.5258 | 124.3284  | 82.1356  |        83.0884         |
|         visformer_small         | 128 | 91.4965  |  96.4165  | 77.5955  |        77.9716         |
|            nfnet_l0             | 128 | 113.1823 | 137.2622  | 75.2878  |        78.0355         |
|      beit_base_patch16_224      | 64  | 101.7443 | 104.8923  | 75.0621  |        74.8873         |
|          gmlp_s16_224           | 128 | 138.2931 | 127.1098  |  74.661  |        74.7115         |
|          jx_nest_base           | 32  | 101.8343 | 101.5733  | 73.4938  |        74.0387         |
|       eca_botnext26ts_256       | 128 | 108.8202 | 147.6708  | 73.4314  |        74.5041         |
|          cspdarknet53           | 64  | 95.2166  | 113.2748  | 72.0234  |        70.3618         |
|           volo_d1_224           | 64  | 121.5234 | 124.0041  | 71.2676  |        72.3068         |
|          botnet26t_256          | 128 | 102.1951 | 116.8438  | 70.7668  |        69.7733         |
|      vit_base_patch16_224       | 64  | 87.1087  |  87.2382  | 70.2787  |        70.1046         |
|            gernet_l             | 128 | 77.9696  |  91.7963  | 70.1871  |        68.1941         |
| deit_base_distilled_patch16_224 | 64  | 85.0488  |  85.3598  | 67.5047  |        67.5225         |
|            repvgg_a2            | 128 | 77.7581  |  96.382   | 67.0086  |        64.9663         |
|          gmixer_24_224          | 128 | 118.5438 | 132.4035  | 66.8918  |         67.218         |
|      xcit_large_24_p8_224       |  5  | 121.8476 |  145.816  |  62.38   |        78.9076         |
|       tf_efficientnet_b0        | 128 | 85.1569  | 120.1302  | 60.1993  |        58.8601         |
|        twins_pcpvt_base         | 64  | 133.6924 | 128.8617  | 60.1901  |        69.0568         |
|           rexnet_100            | 128 | 80.1709  | 109.0394  | 58.6834  |        57.2089         |
|           fbnetc_100            | 128 | 83.0029  | 107.1174  | 58.3554  |        55.9738         |
|         coat_lite_mini          | 128 | 113.354  | 113.5884  | 58.2003  |        58.8254         |
|           mobilevit_s           | 64  | 85.0431  | 111.8257  | 56.9067  |         56.529         |
|            tinynet_a            | 128 | 73.7665  |  103.04   | 56.5693  |        55.3153         |
|        sebotnet33ts_256         | 64  | 80.5857  | 100.8413  | 51.1653  |         51.965         |
|         crossvit_9_240          | 128 | 82.7565  | 104.5244  |  49.833  |        50.7681         |
|          spnasnet_100           | 128 |  70.763  |  90.1947  | 48.8562  |        46.9209         |
|          ghostnet_100           | 128 |  90.939  |  118.014  | 48.7054  |        55.2595         |
|        ese_vovnet19b_dw         | 128 | 64.6835  |  74.692   | 45.9155  |        45.1102         |
|         mobilenetv2_100         | 128 | 65.8208  |  84.6077  | 44.8761  |        43.1867         |
|           selecsls42b           | 128 |  60.24   |  73.9721  | 42.5429  |        42.6307         |
|           mnasnet_100           | 128 | 64.6384  |  82.7202  | 42.5109  |        40.8346         |
|          resmlp_12_224          | 128 | 53.5907  |  59.9284  |  42.202  |        42.2145         |
|      mobilenetv3_large_100      | 128 | 61.5013  |  76.8377  | 40.7585  |        40.5191         |
|           regnety_002           | 128 | 42.8524  |  52.7975  |  26.494  |        29.7694         |
|            lcnet_050            | 128 | 31.7221  |  40.8046  | 17.7525  |        20.7499         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

Build Summary

see more

Run name

day_096_06_04_23_performance_amp_914

Commit hashes

pytorch commit: 2161be0
pytorch commit date: 2023-04-07 02:27:52+00:00
torchbench commit: 11f8700e985c8195b789d714c6b8998e407646f3
torchbench commit date: 2023-04-06 17:24:46-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git2161be0

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 83%, 50/60 | 93%, 42/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.48x    |    1.61x    |    1.39x    |
| inductor_no_cudagraphs |   1.28x    |    1.51x    |    1.40x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.82    |    7.28     |    5.94     |
|       aot_eager        |    9.32    |    15.81    |    13.29    |
|        inductor        |   62.13    |    90.79    |   100.80    |
| inductor_no_cudagraphs |   63.00    |    55.30    |   110.67    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.85x    |    0.90x    |    0.88x    |
|        inductor        |   0.94x    |    0.98x    |    1.02x    |
| inductor_no_cudagraphs |   0.92x    |    1.01x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Previous report name: /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 85%, 51/60  | 83%, 50/60  |
|        inductor        | huggingface | 91%, 41/45  | 93%, 42/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.58x    |   1.48x   |
|        inductor        | huggingface |   1.57x    |   1.61x   |
|        inductor        | timm_models |   1.41x    |   1.39x   |
| inductor_no_cudagraphs | torchbench  |   1.28x    |   1.28x   |
| inductor_no_cudagraphs | huggingface |   1.49x    |   1.51x   |
| inductor_no_cudagraphs | timm_models |   1.39x    |   1.40x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+----------------------------+-----------------+------------------------+
|    suite    |            name            |    inductor     | inductor_no_cudagraphs |
+-------------+----------------------------+-----------------+------------------------+
| torchbench  |       hf_Longformer        |   fail_to_run   |      fail_to_run       |
| torchbench  |            moco            |   fail_to_run   |      fail_to_run       |
| torchbench  |     Background_Matting     | eager_variation |    eager_variation     |
| torchbench  |      vision_maskrcnn       |     0.0000      |          pass          |
| torchbench  |         tacotron2          |     0.0000      |         0.0000         |
| torchbench  |            gat             |     0.0000      |         0.0000         |
| torchbench  |            gcn             |     0.0000      |         0.0000         |
| torchbench  |           llama            |     0.0000      |         0.0000         |
| torchbench  |            sage            |     0.0000      |         0.0000         |
| torchbench  |       torchrec_dlrm        |     0.0000      |         0.0000         |
| huggingface | AlbertForQuestionAnswering |  fail_accuracy  |     fail_accuracy      |
+-------------+----------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |         lennard_jones         |  1.3384  |         0.8764         |
| torchbench  |             dcgan             |  1.2372  |         0.8131         |
| torchbench  |          tts_angular          |  0.9259  |         0.9489         |
| torchbench  |          timm_vovnet          |  0.9182  |         0.9558         |
| torchbench  |      speech_transformer       |  0.0191  |         1.5973         |
| torchbench  |              drq              |  0.0033  |         0.948          |
| torchbench  |       soft_actor_critic       |  0.0023  |         0.7442         |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             dlrm              |   0.0    |         1.1332         |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.083          |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  0.9002  |         0.8233         |
| huggingface |     DebertaV2ForMaskedLM      |  0.8089  |         0.6609         |
| huggingface | DebertaV2ForQuestionAnswering |  0.7799  |         0.6654         |
| huggingface |      LayoutLMForMaskedLM      |   0.0    |         1.6134         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |           hf_BigBird           | 278.2944 |        127.664         |
| torchbench  |          hf_T5_large           | 169.597  |        164.4825        |
| torchbench  |        phlippe_densenet        | 133.4309 |        169.3505        |
| torchbench  |          densenet121           | 125.9005 |        139.3178        |
| torchbench  |       timm_efficientnet        | 122.1092 |        145.7251        |
| torchbench  |       mobilenet_v3_large       | 117.7689 |        140.0195        |
| torchbench  |             yolov3             | 105.0012 |        120.4468        |
| torchbench  |          mobilenet_v2          | 102.7327 |        133.403         |
| torchbench  | timm_vision_transformer_large  |   nan    |        126.5813        |
| huggingface | DebertaV2ForQuestionAnswering  | 585.8753 |        69.5544         |
| huggingface |      DebertaV2ForMaskedLM      | 581.7764 |        72.0125         |
| huggingface |       DebertaForMaskedLM       | 256.123  |        56.5051         |
| huggingface |  DebertaForQuestionAnswering   | 253.2942 |        54.5313         |
| huggingface |     MobileBertForMaskedLM      | 129.6441 |        123.6648        |
| huggingface | MobileBertForQuestionAnswering | 128.224  |         125.16         |
| timm_models |           hrnet_w18            | 237.9889 |        250.819         |
| timm_models |           rexnet_100           | 230.5596 |        277.8564        |
| timm_models |          ghostnet_100          | 199.7177 |        237.481         |
| timm_models |         pnasnet5large          | 161.6252 |        165.7258        |
| timm_models |          resnest101e           | 154.075  |        168.4266        |
| timm_models |           fbnetv3_b            | 151.4519 |        173.7269        |
| timm_models |       res2net101_26w_4s        | 144.5482 |        153.9093        |
| timm_models |        twins_pcpvt_base        | 142.3458 |        149.3782        |
| timm_models |          mobilevit_s           | 142.3303 |        162.5136        |
| timm_models |          tf_mixnet_l           | 140.4875 |        162.5013        |
| timm_models |          inception_v3          | 138.3203 |        154.897         |
| timm_models |           tinynet_a            | 138.307  |        163.1584        |
| timm_models |            mixnet_l            | 138.0546 |        164.2811        |
| timm_models |       gluon_inception_v3       | 136.8929 |        160.6689        |
| timm_models |        adv_inception_v3        | 136.5143 |        158.3682        |
| timm_models |      xcit_large_24_p8_224      | 135.5482 |        135.5559        |
| timm_models |     mobilenetv3_large_100      | 134.5583 |        163.1994        |
| timm_models |       tf_efficientnet_b0       | 127.937  |        150.8902        |
| timm_models |        res2net50_14w_8s        | 120.9862 |        125.4573        |
| timm_models |           fbnetc_100           | 117.2741 |         137.88         |
| timm_models |          spnasnet_100          | 112.394  |        133.933         |
| timm_models |          mnasnet_100           | 108.2173 |        126.891         |
| timm_models |        mobilenetv2_100         | 106.2427 |        130.977         |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |         nvidia_deeprecommender          |  0.9195  |         0.8931         |
| torchbench  |                 hf_Bart                 |  0.9123  |         0.8605         |
| torchbench  |             pytorch_stargan             |  0.8934  |         0.8893         |
| torchbench  |                resnet50                 |  0.8901  |         0.8838         |
| torchbench  |               timm_vovnet               |  0.889   |         0.8869         |
| torchbench  |         timm_vision_transformer         |  0.8873  |         0.8835         |
| torchbench  |            phlippe_densenet             |  0.8834  |         0.8659         |
| torchbench  |           mobilenet_v3_large            |  0.8796  |         0.8087         |
| torchbench  |           speech_transformer            |  0.8418  |         0.8406         |
| torchbench  |               densenet121               |  0.8202  |         0.8034         |
| torchbench  |               hf_Reformer               |  0.8109  |         0.8007         |
| torchbench  |               mnasnet1_0                |  0.7837  |         0.7758         |
| torchbench  |             resnext50_32x4d             |  0.7792  |         0.7709         |
| torchbench  |             LearningToPaint             |  0.7552  |         0.7463         |
| torchbench  |             pytorch_struct              |  0.7428  |         0.7362         |
| torchbench  |                resnet18                 |  0.619   |         0.6097         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.6035  |         0.6172         |
| torchbench  |          functorch_dp_cifar10           |  0.451   |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3554  |         0.3395         |
| huggingface |            TrOCRForCausalLM             |  0.874   |         0.9448         |
| huggingface |          DistilBertForMaskedLM          |  0.8706  |         0.9428         |
| huggingface |            PLBartForCausalLM            |  0.8696  |         0.9347         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.865   |         0.9678         |
| huggingface |     MobileBertForQuestionAnswering      |  0.8579  |         0.8571         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8153  |         0.9043         |
| huggingface |         Speech2Text2ForCausalLM         |  0.7822  |         0.8658         |
| timm_models |               regnety_002               |  0.9009  |         0.8966         |
| timm_models |                lcnet_050                |  0.8898  |         0.884          |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/memory_over_time.png :

bench_logs/passrate_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/comp_time_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Performance speedup regressions

+------------------------+--------------------+-------------+------------+
|        compiler        |        name        | prev_status | cur_status |
+------------------------+--------------------+-------------+------------+
|        inductor        |    tts_angular     |    0.954    |   0.9259   |
|        inductor        | speech_transformer |   1.5841    |   0.0191   |
|        inductor        |        drq         |   1.5197    |   0.0033   |
|        inductor        | soft_actor_critic  |   1.1768    |   0.0023   |
|        inductor        |        dlrm        |   1.9477    |    0.0     |
| inductor_no_cudagraphs |    tts_angular     |   0.9572    |   0.9489   |
| inductor_no_cudagraphs |        drq         |   1.0318    |   0.948    |
+------------------------+--------------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+--------+-------------+------------+
|        compiler        |  name  | prev_status | cur_status |
+------------------------+--------+-------------+------------+
| inductor_no_cudagraphs | yolov3 |  118.3601   |  120.4468  |
+------------------------+--------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Performance speedup regressions

+----------+---------------------+-------------+------------+
| compiler |        name         | prev_status | cur_status |
+----------+---------------------+-------------+------------+
| inductor | DebertaForMaskedLM  |   0.9692    |   0.9002   |
| inductor | LayoutLMForMaskedLM |   1.5654    |    0.0     |
+----------+---------------------+-------------+------------+

Compilation latency (sec) regressions

+----------+-----------------------------+-------------+------------+
| compiler |            name             | prev_status | cur_status |
+----------+-----------------------------+-------------+------------+
| inductor |     DebertaForMaskedLM      |   82.5075   |  256.123   |
| inductor | DebertaForQuestionAnswering |   85.4982   |  253.2942  |
+----------+-----------------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|           BERT_pytorch            |  16  | 1.0049 |  0.8165   |  3.4579  |         2.1065         |
|       functorch_dp_cifar10        |  64  | 0.9723 |  0.9198   |  3.2891  |         1.3689         |
|            hf_BigBird             |  2   | 0.9542 |  0.7868   |  2.5908  |         1.686          |
|            hf_T5_large            |  2   | 1.0076 |  0.8345   |  2.4689  |         1.9986         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9711 |  0.8985   |  2.4163  |         1.7854         |
|             hf_Albert             |  8   | 0.9963 |  0.9568   |  2.321   |         2.3172         |
|              hf_GPT2              |  4   | 1.0218 |  0.9818   |  1.9173  |         1.8713         |
|              hf_Bert              |  4   | 1.0309 |  0.8625   |  1.9002  |         1.6599         |
|               hf_T5               |  8   | 0.994  |  0.8567   |  1.8681  |         1.9283         |
|            densenet121            |  4   | 0.9944 |  0.7222   |  1.8546  |         1.0696         |
|           squeezenet1_1           |  32  | 0.9962 |  0.9165   |  1.8493  |         1.2352         |
|           hf_GPT2_large           |  4   | 1.0001 |  0.9886   |  1.784   |         1.7781         |
|              hf_Bart              |  4   | 0.9879 |  0.8272   |   1.73   |         1.6282         |
|        mobilenet_v3_large         |  32  | 1.001  |  0.7874   |  1.6547  |         1.203          |
|           hf_Bert_large           |  4   | 1.0306 |  0.8943   |  1.6329  |         1.6364         |
|         phlippe_densenet          | 128  | 0.9974 |  0.7789   |  1.6203  |         1.0296         |
|           timm_resnest            |  32  | 0.9965 |   0.852   |  1.5931  |         1.5292         |
|      timm_vision_transformer      |  32  | 0.9917 |  0.8705   |  1.5908  |         1.3887         |
|            timm_nfnet             | 128  |  1.0   |  0.9976   |  1.5831  |         1.5008         |
| attention_is_all_you_need_pytorch | 256  | 1.0043 |  0.9304   |  1.5683  |         1.5071         |
|           mobilenet_v2            |  96  | 0.999  |  0.7786   |  1.5412  |         1.5341         |
|           fastNLP_Bert            |  6   | 1.0002 |  0.8688   |  1.5269  |         1.5126         |
|          phlippe_resnet           | 128  | 0.9859 |   0.761   |  1.5139  |         1.0031         |
|           hf_DistilBert           |  8   | 0.9919 |  0.9664   |  1.5014  |         1.4897         |
|        shufflenet_v2_x1_0         | 128  | 0.997  |  0.7526   |  1.4513  |         1.2249         |
|          pytorch_struct           | 200  | 0.922  |  0.7627   |  1.4043  |         1.1162         |
|           pytorch_unet            |  1   | 0.9984 |  0.2048   |  1.3718  |         1.3575         |
|          resnext50_32x4d          |  8   | 0.9853 |  0.7224   |  1.3633  |         0.9735         |
|             resnet18              |  16  | 0.9863 |  0.7698   |  1.362   |         0.9548         |
|           lennard_jones           | 1000 | 0.8189 |   0.748   |  1.3384  |         0.8764         |
|            mnasnet1_0             |  32  | 0.9958 |  0.7384   |  1.3171  |         1.0457         |
|          pytorch_stargan          |  16  | 0.993  |  0.8041   |  1.2677  |         1.2584         |
|          LearningToPaint          |  96  | 0.9914 |  0.7778   |  1.2671  |         1.0716         |
|               vgg16               |  64  | 0.9994 |  0.9981   |  1.2616  |         1.2544         |
|               dcgan               |  32  | 0.8513 |  0.6929   |  1.2372  |         0.8131         |
|            Super_SloMo            |  6   | 0.998  |  0.1793   |  1.2346  |         1.2351         |
|        Background_Matting         |  4   | 0.9998 |   0.137   |  1.2192  |         1.2087         |
|              yolov3               |  16  | 0.9994 |  0.8087   |  1.2148  |         1.2029         |
|         timm_efficientnet         |  32  | 0.9439 |  0.6276   |  1.1803  |         1.0941         |
|             resnet50              |  32  | 0.9959 |  0.7811   |  1.1769  |         1.0778         |
|              alexnet              | 128  | 0.9989 |  0.9971   |  1.1406  |         1.136          |
|            hf_Reformer            |  4   | 0.9845 |  0.9636   |  1.128   |         1.0682         |
|              demucs               |  4   | 0.9999 |  1.0005   |  1.0597  |         1.0381         |
|             resnet152             |  32  | 0.9994 |  0.7657   |  1.0355  |         1.0004         |
|      nvidia_deeprecommender       | 256  | 0.9987 |  0.9982   |  0.9803  |         1.0191         |
|            timm_regnet            |  32  | 0.9336 |  0.7826   |  0.9746  |         0.9807         |
|            tts_angular            |  64  | 0.9324 |  0.8852   |  0.9259  |         0.9489         |
|            timm_vovnet            |  32  | 0.8748 |  0.7235   |  0.9182  |         0.9558         |
|        speech_transformer         |  32  | 0.9829 |  0.8318   |  0.0191  |         1.5973         |
|                drq                |  1   | 0.9652 |  0.7187   |  0.0033  |         0.948          |
|         soft_actor_critic         | 256  | 0.8511 |   0.625   |  0.0023  |         0.7442         |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9317 |   0.848   |   0.0    |         1.1332         |
|               moco                |  32  | 0.9788 |    0.0    |   0.0    |          0.0           |
|           hf_Longformer           |  2   | 1.0192 |  0.6927   |   0.0    |          0.0           |
|   timm_vision_transformer_large   |  32  | 0.9996 |    0.0    |   0.0    |         1.083          |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  |       pass       |       pass       |      0.0000      |          pass          |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_BigBird             |  2   | 12.9266 |  37.4616  | 278.2944 |        127.664         |
|            hf_T5_large            |  2   | 26.8081 |  55.1972  | 169.597  |        164.4825        |
|         phlippe_densenet          | 128  | 3.3329  |  7.0466   | 133.4309 |        169.3505        |
|            densenet121            |  4   | 7.7016  |  18.1157  | 125.9005 |        139.3178        |
|         timm_efficientnet         |  32  | 5.0141  |  10.1655  | 122.1092 |        145.7251        |
|        mobilenet_v3_large         |  32  | 3.5548  |  7.6836   | 117.7689 |        140.0195        |
|           hf_GPT2_large           |  4   | 14.7518 |  30.0888  | 106.7458 |        103.7238        |
|              yolov3               |  16  | 4.9634  |  10.6436  | 105.0012 |        120.4468        |
|             resnet152             |  32  | 9.2194  |  20.5673  | 103.538  |        106.2195        |
|           mobilenet_v2            |  96  | 3.1461  |  7.0665   | 102.7327 |        133.403         |
|            mnasnet1_0             |  32  | 3.2943  |  6.7379   | 88.4391  |        105.2456        |
|           timm_resnest            |  32  | 1.8335  |  3.9311   | 82.2073  |        101.9671        |
|        speech_transformer         |  32  | 6.1182  |  13.9544  |  77.971  |        65.2616         |
|        shufflenet_v2_x1_0         | 128  | 3.5846  |  7.8333   | 71.9955  |        85.0096         |
|            timm_nfnet             | 128  | 5.8124  |  11.3045  | 69.6225  |        72.7038         |
|            timm_regnet            |  32  |  6.681  |  12.3511  | 69.3497  |        72.2258         |
|            hf_Reformer            |  4   | 4.2269  |  6.1083   | 67.7944  |        42.3152         |
|           hf_Bert_large           |  4   | 10.2687 |  21.4469  | 64.9018  |        64.8919         |
|        Background_Matting         |  4   | 3.0424  |  11.1979  | 63.2042  |        69.5398         |
| attention_is_all_you_need_pytorch | 256  | 4.4379  |  10.8387  | 60.7587  |        58.1014         |
|             resnet50              |  32  | 3.2661  |  7.0953   | 59.3298  |        67.2117         |
|           BERT_pytorch            |  16  |  4.986  |  11.7425  | 57.5776  |        56.8644         |
|           fastNLP_Bert            |  6   | 5.1286  |  11.3277  | 57.2451  |        47.2251         |
|            timm_vovnet            |  32  | 3.6226  |  6.4466   | 57.0954  |        64.4008         |
|               hf_T5               |  8   | 5.5855  |  12.8375  | 52.4126  |        51.4465         |
|           pytorch_unet            |  1   | 1.5509  |  4.4251   | 52.0661  |        62.4963         |
|      timm_vision_transformer      |  32  | 3.2393  |  7.2846   | 51.4118  |        50.7731         |
|          resnext50_32x4d          |  8   | 3.3462  |  7.1688   | 51.1953  |        53.4311         |
|              hf_Bart              |  4   |  6.122  |  13.8982  | 50.7691  |        49.2991         |
|       functorch_dp_cifar10        |  64  | 1.2368  |  2.3976   | 45.7338  |        56.8193         |
|            Super_SloMo            |  6   | 2.7618  |  9.8713   | 44.1352  |        43.2733         |
|              hf_GPT2              |  4   | 4.6942  |  9.8142   | 43.0215  |        41.1989         |
|          LearningToPaint          |  96  | 1.4229  |  2.9066   | 42.0928  |        46.4171         |
|          pytorch_stargan          |  16  | 1.2305  |  3.2678   | 41.9651  |        46.1185         |
|             hf_Albert             |  8   | 2.4887  |  8.0205   | 40.1893  |        41.5474         |
|              hf_Bert              |  4   | 5.0255  |  10.6026  | 39.2489  |        37.6502         |
|             resnet18              |  16  |  1.37   |  2.8312   |  39.062  |        43.9463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2759  |   2.989   | 34.7225  |        38.2899         |
|              demucs               |  4   | 1.4265  |   2.177   | 32.1829  |        30.7429         |
|           hf_DistilBert           |  8   | 2.3649  |  5.2991   | 30.5079  |        31.2453         |
|          phlippe_resnet           | 128  | 1.3631  |  2.8192   |  30.159  |         32.791         |
|           squeezenet1_1           |  32  |  1.081  |  1.7905   |  23.545  |        25.9229         |
|          pytorch_struct           | 200  | 0.7588  |  1.3519   | 22.1119  |        21.4246         |
|              alexnet              | 128  | 0.4958  |  0.7884   |  16.642  |        14.7929         |
|               vgg16               |  64  | 0.6389  |  1.1214   | 16.3636  |        16.4742         |
|                drq                |  1   | 0.6568  |   1.016   | 13.1082  |        11.0962         |
|      nvidia_deeprecommender       | 256  | 0.4924  |  0.7575   | 11.0256  |        10.5858         |
|               dcgan               |  32  | 0.4354  |  0.7053   |  9.3701  |         8.5196         |
|         soft_actor_critic         | 256  | 0.4277  |  0.6088   |  9.3289  |         8.161          |
|           lennard_jones           | 1000 | 0.3999  |  0.6016   |  7.5826  |         6.8042         |
|            tts_angular            |  64  | 0.4502  |  0.5107   |  7.1113  |         6.261          |
|   timm_vision_transformer_large   |  32  | 9.4671  |    nan    |   nan    |        126.5813        |
|               dlrm                | 1024 | 0.3737  |   0.785   |   nan    |         8.2644         |
|           hf_Longformer           |  2   | 9.5594  |  30.9393  |   nan    |          nan           |
|               moco                |  32  | 33.8881 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.265   |         1.2557         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|            hf_BigBird             |  2   | 0.9493 |  0.9264   |  1.1368  |         1.1043         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  1.1118  |         1.0964         |
|           mobilenet_v2            |  96  | 0.9865 |  0.7651   |  1.1082  |         1.1025         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  1.1053  |         0.9973         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0974  |         1.0834         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  1.0935  |         1.0929         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  1.0816  |         1.077          |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  1.0779  |         1.0692         |
|            timm_nfnet             | 128  | 0.9068 |  0.8748   |  1.0761  |         1.0727         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  1.0687  |         0.9997         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  1.0658  |         1.0239         |
|                drq                |  1   | 0.9877 |  0.8852   |  1.0607  |         0.9573         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0431  |         1.0425         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  1.0427  |         1.0406         |
|              yolov3               |  16  | 0.9882 |  0.8285   |  1.037   |         1.0367         |
|         timm_efficientnet         |  32  | 0.9854 |  0.7661   |  1.0119  |         0.9403         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  1.0052  |         1.0041         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9952  |         0.9983         |
|              demucs               |  4   | 0.9661 |  0.9659   |  0.9866  |         0.9656         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.9823  |         0.9808         |
|        shufflenet_v2_x1_0         | 128  | 0.9551 |  0.8395   |  0.9769  |         0.958          |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.9759  |         0.9695         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.9742  |         0.9434         |
|           timm_resnest            |  32  | 0.9883 |  0.8825   |  0.9686  |         0.9674         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.9644  |         0.9645         |
|            timm_regnet            |  32  | 0.9908 |  0.8521   |  0.9552  |         0.9539         |
|             resnet152             |  32  | 0.996  |  0.8948   |  0.9443  |         0.9389         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.9434  |         0.9386         |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.9306  |         0.9308         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.9195  |         0.8931         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.9123  |         0.8605         |
|           squeezenet1_1           |  32  | 0.9695 |  0.9321   |   0.91   |         0.9094         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.8934  |         0.8893         |
|             resnet50              |  32  | 0.9917 |  0.8602   |  0.8901  |         0.8838         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.889   |         0.8869         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8873  |         0.8835         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8834  |         0.8659         |
|        mobilenet_v3_large         |  32  | 0.9765 |  0.8387   |  0.8796  |         0.8087         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8418  |         0.8406         |
|            densenet121            |  4   | 0.9939 |  0.9823   |  0.8202  |         0.8034         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.8109  |         0.8007         |
|            mnasnet1_0             |  32  | 0.9793 |  0.8638   |  0.7837  |         0.7758         |
|          resnext50_32x4d          |  8   | 0.9922 |  0.8409   |  0.7792  |         0.7709         |
|          LearningToPaint          |  96  | 0.9202 |  0.7116   |  0.7552  |         0.7463         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7428  |         0.7362         |
|             resnet18              |  16  | 0.9753 |  0.7786   |  0.619   |         0.6097         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8796   |  0.6035  |         0.6172         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.451   |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3554  |         0.3395         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |         1.0009         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 1.0026 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+-----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor  | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+-----------+------------------------+
|        speech_transformer         |  32  | 60.9276  |  85.4774  | 3058.8254 |        35.6255         |
|                drq                |  1   |  3.4512  |  4.8266   | 1478.5431 |         4.1393         |
|         soft_actor_critic         | 256  |  1.7314  |  2.7298   | 1133.1442 |         2.9553         |
|           hf_GPT2_large           |  4   | 209.6389 | 211.6886  |  117.059  |        117.7414        |
|        Background_Matting         |  4   | 125.9964 | 920.3893  | 103.0196  |        103.9856        |
|               hf_T5               |  8   | 180.2884 | 209.0251  |  96.0617  |        93.8916         |
|            hf_T5_large            |  2   |  220.82  | 261.1624  |  90.6158  |        112.4752        |
|            hf_BigBird             |  2   | 204.5371 | 277.5495  |  75.9116  |        113.8571        |
|            timm_nfnet             | 128  | 117.9479 | 118.7421  |  74.497   |        78.6301         |
|            hf_Reformer            |  4   | 82.2521  |  84.0081  |  71.8394  |         75.771         |
|            Super_SloMo            |  6   | 79.6781  |  443.501  |  64.362   |        64.1987         |
|             resnet152             |  32  | 64.4808  |  83.6144  |  60.6484  |        68.7326         |
|            timm_regnet            |  32  | 59.7479  |  71.0239  |  57.3846  |        56.8352         |
|              yolov3               |  16  | 68.5693  |  84.6536  |  56.3487  |        57.1559         |
|               vgg16               |  64  | 66.4102  |  66.4553  |  52.6424  |         52.793         |
|           hf_Bert_large           |  4   | 80.4261  |  92.0056  |  50.9039  |        57.8874         |
|              demucs               |  4   | 53.9022  |  53.4161  |  50.8243  |        51.3779         |
|           fastNLP_Bert            |  6   | 51.8832  |  60.1938  |  35.2891  |        34.0751         |
| attention_is_all_you_need_pytorch | 256  | 54.7716  |  58.4133  |  34.9666  |        35.8921         |
|              hf_Bart              |  4   | 60.9458  |  71.313   |  32.9638  |        37.2984         |
|           mobilenet_v2            |  96  | 47.0406  |  60.408   |  30.4441  |         30.654         |
|             hf_Albert             |  8   | 68.4703  |  71.4343  |  29.4154  |        30.0064         |
|            densenet121            |  4   |  54.992  |  74.8209  |  29.3028  |        49.9268         |
|           pytorch_unet            |  1   | 39.9282  | 194.4504  |  28.9793  |        29.3062         |
|         timm_efficientnet         |  32  | 34.3753  |  50.8736  |  27.7801  |        29.1568         |
|            timm_vovnet            |  32  | 28.1519  |  34.1631  |  26.9747  |        25.8225         |
|              hf_GPT2              |  4   |  48.429  |  49.9609  |  25.5543  |         26.193         |
|             resnet50              |  32  | 26.7154  |  34.0856  |  22.5836  |        24.4771         |
|              hf_Bert              |  4   | 39.3271  |  46.4752  |  21.7695  |        24.5621         |
|           hf_DistilBert           |  8   | 31.7525  |  32.4807  |  20.8435  |        21.1002         |
|        shufflenet_v2_x1_0         | 128  | 31.1451  |  40.9029  |  20.7141  |        24.6379         |
|            mnasnet1_0             |  32  | 23.3999  |  30.0512  |  17.728   |        22.0396         |
|      timm_vision_transformer      |  32  | 28.0231  |  32.9356  |  17.6544  |        19.9937         |
|        mobilenet_v3_large         |  32  | 30.4337  |  33.8979  |  15.9323  |        21.7962         |
|           BERT_pytorch            |  16  | 54.7131  |  66.8442  |  15.768   |        26.0306         |
|           timm_resnest            |  32  | 24.1882  |  28.3626  |  15.0768  |         15.746         |
|         phlippe_densenet          | 128  | 23.2912  |  29.3096  |  14.6436  |        22.7324         |
|          resnext50_32x4d          |  8   | 21.0309  |  28.2123  |  14.5445  |         22.421         |
|          pytorch_stargan          |  16  |  15.483  |  18.5789  |  11.4985  |         11.853         |
|      nvidia_deeprecommender       | 256  | 10.2388  |  10.2545  |  10.4441  |         10.037         |
|          LearningToPaint          |  96  |  11.376  |  14.537   |  8.9722   |        10.6711         |
|              alexnet              | 128  |  9.849   |  9.8574   |  8.6196   |         8.6434         |
|             resnet18              |  16  |  9.4058  |  12.0101  |  6.8433   |        10.1411         |
|            tts_angular            |  64  |  6.6598  |  6.9947   |  6.7529   |         6.5205         |
|          phlippe_resnet           | 128  |  9.0118  |  11.5617  |   5.964   |          9.01          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.7586  |  15.6097  |  5.6091   |         7.9193         |
|           squeezenet1_1           |  32  | 10.9694  |  12.9025  |  5.3151   |         8.8142         |
|          pytorch_struct           | 200  |  5.0727  |   6.074   |  3.3587   |         4.2162         |
|       functorch_dp_cifar10        |  64  | 10.4766  |  10.8557  |  3.1182   |         7.5759         |
|               dcgan               |  32  |  2.4205  |  2.9863   |  1.7272   |         2.622          |
|           lennard_jones           | 1000 |  1.8499  |  2.1133   |  1.3306   |         2.0218         |
|   timm_vision_transformer_large   |  32  | 463.9949 |    nan    |    nan    |        428.8487        |
|               dlrm                | 1024 |  4.3769  |  4.8773   |    nan    |         3.9574         |
|           hf_Longformer           |  2   | 111.1927 | 164.3236  |    nan    |          nan           |
|               moco                |  32  | 52.2231  |    nan    |    nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |    nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |    nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |    nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |    nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |    nan    |          nan           |
+-----------------------------------+------+----------+-----------+-----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 1.0139 |  0.8554   |  2.8599  |         1.1417         |
|             OPTForCausalLM              |  2  | 0.9925 |  0.9065   |  2.5075  |         2.5348         |
|     MobileBertForQuestionAnswering      | 128 | 1.0163 |   0.857   |  2.483   |         1.1537         |
|       MT5ForConditionalGeneration       | 16  | 1.0178 |  0.8568   |  2.3226  |         1.9222         |
|      GPT2ForSequenceClassification      |  4  | 0.9882 |  0.9615   |  2.3201  |         2.3352         |
|             XGLMForCausalLM             |  8  | 1.0128 |  0.8484   |  2.1327  |         1.5336         |
|       ElectraForQuestionAnswering       | 64  | 0.9978 |  0.9873   |  2.1283  |         2.1186         |
|     M2M100ForConditionalGeneration      | 16  | 0.9982 |  0.8404   |  1.9811  |         1.4206         |
|           ElectraForCausalLM            | 32  | 0.9955 |  0.9497   |  1.8616  |         1.8551         |
|    LayoutLMForSequenceClassification    | 16  | 0.9962 |  0.9825   |  1.8396  |         1.8113         |
|       RobertaForQuestionAnswering       | 16  | 0.9971 |  0.9822   |  1.7863  |         1.7849         |
|        BertForQuestionAnswering         | 16  | 0.9985 |  0.9822   |  1.7781  |         1.7796         |
|            XLNetLMHeadModel             |  8  |  1.0   |  0.9684   |  1.7597  |         1.7573         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9978 |  0.9768   |  1.6785  |         1.6551         |
|            PLBartForCausalLM            |  8  | 0.9923 |  0.9606   |  1.6777  |         1.7054         |
|           RobertaForCausalLM            | 16  | 0.9976 |  0.9725   |  1.6703  |         1.6781         |
|               DistillGPT2               | 16  | 0.9926 |  0.9601   |  1.6702  |         1.7048         |
|     PLBartForConditionalGeneration      |  4  | 0.9887 |  0.9518   |  1.6616  |         1.6508         |
|       AlbertForQuestionAnswering        |  4  | 1.0003 |  0.8858   |  1.6504  |         1.6538         |
|            AlbertForMaskedLM            |  4  | 1.0001 |   0.885   |  1.642   |         1.6457         |
|                 T5Small                 |  4  | 0.9941 |   0.858   |  1.6151  |         1.606          |
|       T5ForConditionalGeneration        |  4  | 0.9936 |  0.8608   |  1.6127  |         1.6025         |
|             BertForMaskedLM             | 16  | 0.9972 |  0.9719   |  1.6074  |         1.5975         |
|         MegatronBertForCausalLM         |  4  | 1.0186 |  0.9471   |  1.5918  |         1.5548         |
|      MBartForConditionalGeneration      |  2  | 1.0107 |  0.9739   |  1.5559  |         1.4943         |
|             BartForCausalLM             |  4  | 0.9896 |  0.9589   |  1.5548  |         1.5599         |
|      BartForConditionalGeneration       |  2  | 1.0071 |  0.9599   |  1.5547  |         1.4964         |
|                CamemBert                | 16  | 0.9977 |  0.9728   |  1.5473  |         1.5469         |
|            MBartForCausalLM             |  4  | 0.9879 |  0.9578   |  1.5441  |         1.5549         |
|         Speech2Text2ForCausalLM         | 256 | 0.9828 |  0.9291   |  1.5379  |         1.5753         |
|            YituTechConvBert             | 16  | 0.9975 |  0.9695   |  1.5174  |         1.5118         |
|     DistilBertForQuestionAnswering      | 256 | 0.9967 |  0.9902   |  1.4626  |         1.4526         |
|          BlenderbotForCausalLM          |  4  | 0.996  |  0.8455   |  1.4595  |         1.3244         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0059 |   0.909   |  1.4189  |         1.4356         |
|     PegasusForConditionalGeneration     | 32  | 1.0107 |  0.9396   |  1.3585  |         1.2854         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9875 |  0.9194   |  1.3084  |         1.2708         |
|           PegasusForCausalLM            | 32  | 0.9836 |  0.9326   |  1.2803  |         1.2189         |
|            TrOCRForCausalLM             | 32  | 0.9889 |   0.957   |  1.2779  |         1.3014         |
|          DistilBertForMaskedLM          | 128 | 0.9962 |  0.9547   |  1.2191  |         1.2372         |
|       DebertaForQuestionAnswering       |  8  | 0.7971 |  0.7045   |  1.0081  |         0.9632         |
|           DebertaForMaskedLM            |  4  | 0.7405 |  0.5613   |  0.9002  |         0.8233         |
|          DebertaV2ForMaskedLM           |  1  | 0.7082 |   0.527   |  0.8089  |         0.6609         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7036 |  0.5276   |  0.7799  |         0.6654         |
|           LayoutLMForMaskedLM           | 16  | 0.9975 |  0.9726   |   0.0    |         1.6134         |
|          AllenaiLongformerBase          |  4  | 1.0097 |  0.6719   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|      DebertaV2ForQuestionAnswering      |  2  | 15.8506 |  27.0161  | 585.8753 |        69.5544         |
|          DebertaV2ForMaskedLM           |  1  | 15.436  |  27.4042  | 581.7764 |        72.0125         |
|           DebertaForMaskedLM            |  4  | 7.3426  |  13.6037  | 256.123  |        56.5051         |
|       DebertaForQuestionAnswering       |  8  | 7.4469  |  13.4707  | 253.2942 |        54.5313         |
|          MobileBertForMaskedLM          | 64  | 17.0075 |  40.088   | 129.6441 |        123.6648        |
|     MobileBertForQuestionAnswering      | 128 | 16.9583 |  39.8903  | 128.224  |         125.16         |
|     M2M100ForConditionalGeneration      | 16  | 12.1741 |  27.2252  | 105.3658 |        103.3911        |
|             XGLMForCausalLM             |  8  | 9.8979  |  21.257   | 99.9815  |        99.6257         |
|            XLNetLMHeadModel             |  8  | 10.5122 |  27.6392  | 91.7632  |        92.4828         |
|       MT5ForConditionalGeneration       | 16  | 8.1121  |  18.4823  | 90.7427  |        91.7148         |
|      MBartForConditionalGeneration      |  2  | 11.9078 |  25.9296  | 81.4744  |        78.5872         |
|      BartForConditionalGeneration       |  2  | 11.8202 |  26.2707  | 76.3623  |        74.4997         |
|     PegasusForConditionalGeneration     | 32  | 5.5885  |  19.6803  | 70.5724  |        65.6423         |
|          BlenderbotForCausalLM          |  4  | 10.9567 |  22.0012  | 69.2844  |        67.6179         |
|    MegatronBertForQuestionAnswering     |  8  | 10.2959 |  21.3758  | 67.8289  |        63.5094         |
|         MegatronBertForCausalLM         |  4  | 10.3455 |  21.4487  |  67.426  |        65.7905         |
|            YituTechConvBert             | 16  |  7.408  |  15.6976  | 66.8241  |        67.9088         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.6718  |  17.5804  | 57.3128  |         54.409         |
|       T5ForConditionalGeneration        |  4  | 5.6479  |  12.6338  | 51.2132  |        49.1911         |
|                 T5Small                 |  4  | 5.7732  |  12.6625  | 50.6908  |         49.298         |
|     PLBartForConditionalGeneration      |  4  |  6.413  |  13.9619  | 49.4379  |        48.5032         |
|           ElectraForCausalLM            | 32  | 5.2704  |  10.8186  | 46.2955  |        46.5193         |
|    LayoutLMForSequenceClassification    | 16  | 5.4951  |   11.18   | 45.9508  |        47.4557         |
|             BertForMaskedLM             | 16  | 5.4357  |  10.7387  | 41.8384  |         38.771         |
|            MBartForCausalLM             |  4  | 5.7131  |  11.1409  | 41.7385  |        40.6005         |
|        BertForQuestionAnswering         | 16  | 5.3703  |  10.686   | 40.6888  |        39.9727         |
|             BartForCausalLM             |  4  | 5.6572  |  11.3868  | 40.2876  |        38.8877         |
|           PegasusForCausalLM            | 32  | 5.6539  |  10.9433  | 39.3764  |        37.7121         |
|            TrOCRForCausalLM             | 32  | 5.6785  |  10.9678  | 39.1846  |         38.223         |
|       ElectraForQuestionAnswering       | 64  | 5.2858  |  10.7372  | 39.1211  |        38.2818         |
|            AlbertForMaskedLM            |  4  | 2.2658  |   8.323   | 38.6809  |        37.8478         |
|           RobertaForCausalLM            | 16  | 5.2986  |  11.3035  | 37.9423  |         36.785         |
|             OPTForCausalLM              |  2  | 4.7418  |  10.2596  | 37.7019  |        37.1698         |
|                CamemBert                | 16  | 5.5109  |  10.8617  |  37.647  |        36.6648         |
|     DistilBertForQuestionAnswering      | 256 |  2.505  |  5.2674   | 36.8323  |        33.8891         |
|      GPT2ForSequenceClassification      |  4  | 4.8778  |  9.9442   |  36.562  |        36.6177         |
|       RobertaForQuestionAnswering       | 16  | 5.2367  |  10.7142  | 35.8416  |        35.0468         |
|       AlbertForQuestionAnswering        |  4  | 2.2307  |  8.1406   | 34.7163  |        34.0669         |
|          DistilBertForMaskedLM          | 128 | 2.5321  |  5.3059   | 33.2922  |        32.1959         |
|       BlenderbotSmallForCausalLM        | 64  | 4.1572  |  7.6082   | 30.3324  |        29.7129         |
|               DistillGPT2               | 16  | 2.5568  |  5.1426   |  29.302  |        29.8828         |
|            PLBartForCausalLM            |  8  | 3.0583  |  5.9521   | 26.9079  |        26.0107         |
|         Speech2Text2ForCausalLM         | 256 | 3.0576  |  5.7997   | 26.6099  |        25.5325         |
|           LayoutLMForMaskedLM           | 16  | 5.5698  |   11.31   |   nan    |        40.4464         |
|          AllenaiLongformerBase          |  4  | 9.8283  |  31.4597  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  1.3156  |         1.3147         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  1.2697  |         1.268          |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0537   |  1.213   |         1.1527         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.0973  |         1.1266         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.0909  |         1.1285         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.085   |         1.0823         |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.0726  |         1.0717         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0699  |         1.0658         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0696  |         1.0654         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0399  |         1.0356         |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.0331  |         1.0331         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |  1.0319  |         0.9978         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  1.0292  |         1.0285         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0215  |         1.0672         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0215  |         1.0672         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  1.0211  |         1.0329         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9763   |  1.0021  |         0.9797         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |  1.0003  |         0.9988         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  0.998   |         0.996          |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.9957  |         1.0219         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.9895  |         1.0487         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.9888  |         0.9665         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  0.9875  |         0.9841         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  0.9874  |         0.984          |
|                CamemBert                | 16  |  1.0   |  0.9184   |  0.9858  |         0.9825         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.9799  |         1.0054         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9404  |         1.0274         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.9379  |         1.0016         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9366  |         0.9827         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9339  |         0.9325         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9219  |         0.9666         |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.9181  |          0.97          |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.911   |         0.9669         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9083  |         0.9729         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.9048  |         0.9733         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9039  |         0.9912         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.874   |         0.9448         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8706  |         0.9428         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.8696  |         0.9347         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.865   |         0.9678         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.8579  |         0.8571         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8153  |         0.9043         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.7822  |         0.8658         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |   nan    |         0.9842         |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8684   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.2175 |  300.505  | 162.2565 |        161.7751        |
|       AlbertForQuestionAnswering        |  4  | 263.9303 | 298.3018  | 160.0754 |        159.7509        |
|            XLNetLMHeadModel             |  8  | 279.0917 | 288.6096  | 158.0108 |        158.591         |
|      DebertaV2ForQuestionAnswering      |  2  | 164.3124 | 198.3526  | 136.7504 |        162.0161        |
|          DebertaV2ForMaskedLM           |  1  | 163.3104 | 197.7795  | 130.2116 |        157.7799        |
|     PegasusForConditionalGeneration     | 32  | 142.3841 | 164.1105  | 108.4055 |        107.3852        |
|            TrOCRForCausalLM             | 32  | 139.3464 | 143.4338  | 107.8097 |        106.8681        |
|      MBartForConditionalGeneration      |  2  | 139.1575 | 142.1798  | 91.5919  |        92.1756         |
|      BartForConditionalGeneration       |  2  | 138.8556 | 145.3238  | 91.2566  |        91.3642         |
|    MegatronBertForQuestionAnswering     |  8  | 142.5983 | 145.4196  | 84.5679  |        85.7941         |
|            YituTechConvBert             | 16  | 125.9481 | 129.1015  | 82.7356  |         83.009         |
|          BlenderbotForCausalLM          |  4  | 110.9264 | 137.5351  | 81.4148  |        81.8756         |
| BlenderbotSmallForConditionalGeneration | 64  | 131.1821 | 128.5983  | 79.1045  |        78.9859         |
|                CamemBert                | 16  | 119.2462 | 121.7553  | 76.6659  |        76.6911         |
|       DebertaForQuestionAnswering       |  8  | 94.8569  |  107.681  | 74.9961  |        78.5258         |
|            MBartForCausalLM             |  4  | 115.0567 | 118.4996  | 73.8673  |        73.1206         |
|             BartForCausalLM             |  4  |  114.65  | 118.1882  | 72.9935  |        72.6783         |
|     M2M100ForConditionalGeneration      | 16  | 117.3032 | 146.5042  | 72.3514  |        79.1522         |
|     PLBartForConditionalGeneration      |  4  | 119.2583 | 122.7054  | 70.9534  |         70.448         |
|           DebertaForMaskedLM            |  4  | 92.9567  | 107.2108  |  70.907  |        75.3731         |
|     DistilBertForQuestionAnswering      | 256 | 103.5626 | 104.1728  | 70.7924  |        70.9182         |
|            PLBartForCausalLM            |  8  | 116.5717 | 120.3001  | 69.8763  |         68.64          |
|          DistilBertForMaskedLM          | 128 | 84.8998  |  88.5906  | 69.4143  |        68.4644         |
|             BertForMaskedLM             | 16  | 110.6944 | 113.1758  | 69.1472  |        68.8686         |
|           RobertaForCausalLM            | 16  | 115.3826 | 118.4527  | 68.9123  |        68.5164         |
|     MobileBertForQuestionAnswering      | 128 | 166.6456 | 193.7849  | 67.5379  |        151.7039        |
|             OPTForCausalLM              |  2  | 170.4427 | 181.0747  | 67.4969  |        67.0196         |
|       T5ForConditionalGeneration        |  4  | 104.8657 | 121.2081  | 64.6681  |        65.1329         |
|                 T5Small                 |  4  | 105.017  | 121.8481  | 64.6139  |         65.173         |
|               DistillGPT2               | 16  | 106.6626 | 110.1747  | 63.3119  |        62.3075         |
|          MobileBertForMaskedLM          | 64  | 164.5324 | 199.1655  |  59.225  |        150.4771        |
|           PegasusForCausalLM            | 32  | 70.3465  |  73.979   |  57.676  |         56.766         |
|         MegatronBertForCausalLM         |  4  | 85.9333  |  91.3485  |  54.756  |        56.2303         |
|       ElectraForQuestionAnswering       | 64  | 115.0247 | 115.9057  | 53.9367  |         54.055         |
|        BertForQuestionAnswering         | 16  | 96.7579  |  96.7244  | 53.6031  |        53.5596         |
|       RobertaForQuestionAnswering       | 16  | 95.8978  |  97.236   | 53.5417  |        53.6277         |
|    LayoutLMForSequenceClassification    | 16  | 98.0795  |  99.4388  | 53.1475  |        54.6804         |
|             XGLMForCausalLM             |  8  | 95.3667  | 106.3427  | 51.1755  |        57.5824         |
|           ElectraForCausalLM            | 32  | 88.3493  |  92.6508  |  47.307  |        47.4423         |
|       BlenderbotSmallForCausalLM        | 64  | 66.7146  |  62.9832  | 46.5693  |        45.4598         |
|       MT5ForConditionalGeneration       | 16  | 91.2048  | 107.7899  | 39.7958  |        47.9874         |
|      GPT2ForSequenceClassification      |  4  | 92.7541  |  95.1982  | 39.4337  |         39.828         |
|         Speech2Text2ForCausalLM         | 256 | 53.7962  |  57.2797  | 34.8694  |        33.7927         |
|           LayoutLMForMaskedLM           | 16  | 112.7453 | 115.6584  |   nan    |        70.0075         |
|          AllenaiLongformerBase          |  4  | 180.2748 | 270.3353  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 1.0003 |  0.9979   |  3.0431  |         2.9873         |
|      xcit_large_24_p8_224       |  5  | 0.9977 |  0.8782   |  1.9663  |         1.5813         |
|         coat_lite_mini          | 128 | 0.9995 |  0.9983   |  1.9615  |         1.9267         |
|        twins_pcpvt_base         | 64  | 1.0056 |  0.9148   |  1.9501  |         1.6934         |
|          gmlp_s16_224           | 128 | 0.9999 |  1.0883   |  1.8472  |         1.8489         |
|          ghostnet_100           | 128 | 0.9985 |  0.7689   |  1.8121  |         1.5968         |
|          gmixer_24_224          | 128 | 0.9999 |   0.893   |  1.7613  |         1.7595         |
|           volo_d1_224           | 64  | 0.9994 |  0.9781   |  1.7188  |         1.6814         |
|         crossvit_9_240          | 128 | 0.9997 |   0.789   |  1.6614  |         1.6391         |
|  swin_base_patch4_window7_224   | 64  | 0.9995 |  0.9644   |  1.6589  |         1.6297         |
|           convit_base           | 64  | 0.9998 |  0.9996   |  1.6294  |         1.6126         |
|          inception_v3           | 128 | 0.9996 |  0.8671   |  1.5486  |         1.5289         |
|        adv_inception_v3         | 128 | 0.9997 |  0.8634   |  1.5467  |         1.5281         |
|             dla102              | 128 | 0.9995 |  0.8178   |  1.5454  |         1.5326         |
|       gluon_inception_v3        | 128 | 0.9997 |  0.8675   |  1.5449  |         1.5299         |
|          convnext_base          | 64  | 0.9998 |  1.0009   |  1.5287  |         1.5066         |
|            nfnet_l0             | 128 | 0.9986 |  0.8204   |  1.523   |         1.4547         |
|           dm_nfnet_f0           | 128 | 0.9994 |  0.9969   |  1.5193  |         1.4558         |
|            lcnet_050            | 128 | 0.9429 |  0.7351   |  1.5095  |         1.4516         |
|        sebotnet33ts_256         | 64  | 0.9656 |  0.7692   |  1.4858  |         1.555          |
|            pit_b_224            | 64  | 0.9993 |  0.9974   |  1.4362  |         1.4371         |
|       eca_botnext26ts_256       | 128 | 0.9785 |  0.7217   |  1.4337  |         1.4331         |
|           resnest101e           | 64  | 0.9991 |  0.8714   |  1.4323  |         1.3664         |
|           selecsls42b           | 128 | 0.9993 |  0.8122   |  1.4248  |         1.4154         |
|           mobilevit_s           | 64  | 0.9709 |  0.7368   |  1.4197  |         1.4614         |
|          jx_nest_base           | 32  | 0.9996 |  0.9972   |  1.4093  |         1.3791         |
|          cait_m36_384           |  4  | 1.0002 |  0.9984   |  1.3912  |         1.3545         |
|      mobilenetv3_large_100      | 128 | 0.9513 |  0.7615   |  1.3884  |         1.4506         |
|          botnet26t_256          | 128 | 0.976  |  0.8527   |  1.3836  |         1.429          |
|           res2next50            | 128 | 0.9997 |  0.8258   |  1.3788  |         1.3659         |
|           mnasnet_100           | 128 | 0.9495 |  0.7409   |  1.3719  |         1.4942         |
|      beit_base_patch16_224      | 64  | 0.9996 |  0.9529   |  1.3701  |         1.3552         |
|          mixer_b16_224          | 128 | 0.9993 |  1.0206   |  1.3658  |         1.3659         |
|         poolformer_m36          | 64  | 0.9997 |  0.9962   |  1.3605  |         1.3425         |
|        ese_vovnet19b_dw         | 128 | 0.965  |  0.8381   |  1.3464  |         1.3843         |
|        res2net50_14w_8s         | 128 | 0.9993 |  0.7905   |  1.333   |         1.3576         |
|         mobilenetv2_100         | 128 | 0.951  |  0.7373   |  1.331   |         1.4489         |
|       tf_efficientnet_b0        | 128 | 0.9639 |  0.6826   |  1.3172  |         1.3903         |
|           regnety_002           | 128 | 0.9647 |  0.7308   |  1.3083  |         1.2521         |
|           fbnetc_100            | 128 | 0.952  |  0.7396   |  1.3022  |         1.4042         |
|          spnasnet_100           | 128 | 0.9423 |   0.739   |  1.2911  |         1.4221         |
|           rexnet_100            | 128 | 0.9602 |  0.7077   |  1.2779  |         1.3491         |
| deit_base_distilled_patch16_224 | 64  | 0.9997 |  0.9968   |  1.2746  |         1.2598         |
|            fbnetv3_b            | 128 | 0.9527 |  0.7709   |  1.2663  |         1.3193         |
|          resmlp_12_224          | 128 | 0.9999 |  0.8947   |  1.2655  |         1.2673         |
|      vit_base_patch16_224       | 64  | 0.9993 |  0.9967   |  1.2535  |         1.2398         |
|          cspdarknet53           | 64  | 0.9422 |  0.7917   |  1.2005  |         1.2767         |
|            tinynet_a            | 128 | 0.9507 |  0.6805   |  1.1885  |         1.2678         |
|         visformer_small         | 128 | 0.999  |   0.948   |  1.1841  |         1.1703         |
|           tf_mixnet_l           | 128 | 0.9812 |  0.8303   |  1.1753  |         1.1977         |
|            mixnet_l             | 128 | 0.9799 |  0.8235   |  1.1643  |         1.1876         |
|            hrnet_w18            | 128 | 0.9979 |  0.6464   |  1.1056  |         1.3552         |
|     swsl_resnext101_32x16d      | 32  | 0.9995 |  0.8437   |  1.079   |         1.0241         |
|        gluon_xception65         | 32  | 0.9997 |  0.8465   |  1.0736  |         1.0878         |
|             dpn107              | 32  | 0.9404 |  0.8131   |  1.0674  |         1.1491         |
|            repvgg_a2            | 128 | 0.9421 |  0.7597   |  1.059   |         1.1307         |
|            gernet_l             | 128 | 0.9446 |  0.7981   |  1.0168  |         1.0773         |
|        convmixer_768_32         | 32  | 0.9994 |  0.9657   |  1.0027  |         1.0037         |
|          pnasnet5large          | 16  | 0.9977 |   0.925   |  0.9902  |         1.144          |
|        res2net101_26w_4s        | 64  | 1.0017 |  0.7996   |  0.9863  |         1.0881         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|            hrnet_w18            | 128 | 9.6586  |  36.5942  | 237.9889 |        250.819         |
|           rexnet_100            | 128 | 5.6373  |  11.4763  | 230.5596 |        277.8564        |
|          ghostnet_100           | 128 | 7.5007  |  15.2696  | 199.7177 |        237.481         |
|          pnasnet5large          | 16  | 8.1857  |  26.529   | 161.6252 |        165.7258        |
|           resnest101e           | 64  | 11.2971 |  24.8151  | 154.075  |        168.4266        |
|            fbnetv3_b            | 128 | 8.4338  |  17.638   | 151.4519 |        173.7269        |
|        res2net101_26w_4s        | 64  | 10.704  |  25.324   | 144.5482 |        153.9093        |
|        twins_pcpvt_base         | 64  | 10.7864 |  23.5205  | 142.3458 |        149.3782        |
|           mobilevit_s           | 64  | 5.3446  |  11.4897  | 142.3303 |        162.5136        |
|           tf_mixnet_l           | 128 | 8.9551  |  17.2816  | 140.4875 |        162.5013        |
|          inception_v3           | 128 | 5.7148  |  12.5494  | 138.3203 |        154.897         |
|            tinynet_a            | 128 | 6.1583  |  12.4794  | 138.307  |        163.1584        |
|            mixnet_l             | 128 |  8.429  |  16.6633  | 138.0546 |        164.2811        |
|       gluon_inception_v3        | 128 | 5.6984  |  12.8069  | 136.8929 |        160.6689        |
|        adv_inception_v3         | 128 | 5.7241  |  12.6205  | 136.5143 |        158.3682        |
|      xcit_large_24_p8_224       |  5  | 12.7712 |  28.318   | 135.5482 |        135.5559        |
|      mobilenetv3_large_100      | 128 | 4.2478  |  8.6406   | 134.5583 |        163.1994        |
|       tf_efficientnet_b0        | 128 | 5.1905  |  10.6865  | 127.937  |        150.8902        |
|        res2net50_14w_8s         | 128 | 9.0907  |  22.7702  | 120.9862 |        125.4573        |
|          cait_m36_384           |  4  | 13.9062 |  30.756   | 117.9919 |        115.2238        |
|           fbnetc_100            | 128 | 5.1093  |   9.61    | 117.2741 |         137.88         |
|          spnasnet_100           | 128 | 5.0864  |  9.5532   | 112.394  |        133.933         |
|  swin_base_patch4_window7_224   | 64  | 8.5305  |  19.3543  | 110.9477 |        108.6009        |
|           mnasnet_100           | 128 |  4.091  |  7.8252   | 108.2173 |        126.891         |
|         mobilenetv2_100         | 128 | 4.0978  |  8.0525   | 106.2427 |        130.977         |
|        sebotnet33ts_256         | 64  | 4.2338  |  9.0995   | 99.6776  |        111.0992        |
|         poolformer_m36          | 64  | 7.6444  |  13.8492  | 99.4696  |        102.0034        |
|             dpn107              | 32  | 9.8338  |  19.6785  | 97.7195  |        100.9684        |
|           regnety_002           | 128 | 4.8934  |  8.9229   | 90.9649  |        106.0543        |
|             dla102              | 128 | 6.2421  |  14.2202  | 90.0403  |        97.2761         |
|        gluon_xception65         | 32  | 7.8377  |  17.1771  | 90.0248  |        94.7826         |
|         coat_lite_mini          | 128 | 3.3343  |  7.9893   | 89.7669  |        91.1927         |
|          cspdarknet53           | 64  | 5.7933  |  11.0403  | 88.2698  |        101.3185        |
|         crossvit_9_240          | 128 | 5.8964  |  13.2453  | 87.7964  |        89.1364         |
|          jx_nest_base           | 32  | 6.8005  |  14.7177  | 86.2488  |        87.2848         |
|          botnet26t_256          | 128 | 2.9364  |  5.9409   | 82.7946  |        90.0145         |
|       eca_botnext26ts_256       | 128 | 3.0616  |  6.9841   | 82.7682  |        96.8123         |
|           res2next50            | 128 | 5.1349  |  12.2941  | 82.2017  |        86.8994         |
|            lcnet_050            | 128 | 2.5395  |  5.1005   | 81.4807  |        101.0657        |
|           selecsls42b           | 128 |  2.522  |  5.4715   | 79.9522  |         93.078         |
|           volo_d1_224           | 64  | 5.0282  |  11.771   | 73.7773  |        75.6893         |
|            nfnet_l0             | 128 | 5.3458  |  11.1471  | 73.1723  |         77.995         |
|        tnt_s_patch16_224        | 128 | 6.7254  |  16.4529  | 72.4378  |        70.2235         |
|            gernet_l             | 128 | 5.0593  |  9.0848   |  71.367  |         81.78          |
|           dm_nfnet_f0           | 128 |  6.065  |  11.5285  | 69.0244  |        75.9435         |
|        ese_vovnet19b_dw         | 128 | 2.5918  |  4.7484   | 68.7681  |        79.0975         |
|         visformer_small         | 128 | 2.6823  |  6.1918   | 65.0779  |        69.4706         |
|     swsl_resnext101_32x16d      | 32  | 6.2136  |  13.8448  | 64.2437  |        63.2696         |
|          gmlp_s16_224           | 128 | 5.6481  |  12.3202  | 61.0772  |        61.3143         |
|          convnext_base          | 64  | 6.7097  |  12.7048  | 60.3597  |        60.6746         |
|            repvgg_a2            | 128 | 4.8856  |  8.8986   | 57.5677  |        59.4211         |
|          gmixer_24_224          | 128 | 5.7496  |  13.0068  | 53.3068  |        53.3339         |
|           convit_base           | 64  | 3.5385  |  8.7936   | 51.5701  |        48.6019         |
|            pit_b_224            | 64  | 3.5282  |  8.0695   | 47.1598  |        45.7235         |
| deit_base_distilled_patch16_224 | 64  | 3.1931  |  7.1163   | 45.1317  |        43.4797         |
|      vit_base_patch16_224       | 64  | 3.0855  |  7.2612   | 42.6297  |        40.2994         |
|          resmlp_12_224          | 128 | 2.8401  |  5.5061   | 41.9275  |        41.5649         |
|        convmixer_768_32         | 32  | 1.7009  |  6.8748   | 40.8568  |        38.2153         |
|      beit_base_patch16_224      | 64  | 3.8949  |  8.8936   | 37.8937  |        37.4712         |
|          mixer_b16_224          | 128 | 2.6952  |  5.9969   | 36.0613  |        35.0728         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.2872  |         1.2836         |
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.2057  |         1.2049         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  1.1899  |         1.1871         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1607  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.1583  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.1215  |         1.1179         |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  1.1129  |         1.1115         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  1.089   |         1.0876         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.0875  |         1.0845         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  1.0758  |         1.0721         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  1.0757  |         1.0728         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  1.0696  |         1.0675         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  1.0556  |         1.0539         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  1.0512  |         1.0506         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  1.0494  |         1.0457         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0377  |         1.0351         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  1.0361  |         1.0328         |
|          convnext_base          | 64  | 1.001  |   0.924   |  1.0346  |         1.0338         |
|             dla102              | 128 | 0.9635 |  0.9155   |  1.0323  |         1.0325         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  1.0251  |         1.0242         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  1.021   |         1.0202         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  1.0203  |         1.0194         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  1.0082  |         1.0072         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  1.0071  |         1.0057         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9976  |         0.9952         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9957  |         0.9948         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.9925  |          0.99          |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.9923  |         0.9902         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9917  |         0.9903         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.9912  |         0.9898         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9905  |         0.989          |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.9885  |         0.989          |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9864  |         0.9854         |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9821  |         0.9793         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.9793  |         0.9786         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.9793  |         0.977          |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.979   |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.9776  |         0.9732         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.9738  |         0.9706         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9732  |         0.9727         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.9714  |         0.9705         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.9702  |         0.9664         |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.966   |         0.9611         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.9646  |         0.9642         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.9637  |         0.9607         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.9612  |         0.9604         |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.9582  |         0.9535         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.9568  |         0.9547         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9562  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9537  |         0.9528         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.9509  |         0.9483         |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.9497  |         0.9451         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.9448  |         0.9403         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.9376  |         0.9361         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.9046  |         0.9045         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.9009  |         0.8966         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.8898  |         0.884          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.4373 | 310.7403  | 299.7493 |        299.2593        |
|            hrnet_w18            | 128 | 281.7472 | 432.6549  | 253.0703 |        206.6066        |
|          pnasnet5large          | 16  | 197.1561 | 212.1997  | 198.435  |        171.7257        |
|           tf_mixnet_l           | 128 | 193.4359 | 228.7824  | 161.1795 |        158.5687        |
|            mixnet_l             | 128 | 185.066  | 220.4571  | 155.6847 |        152.7543        |
|          cait_m36_384           |  4  | 167.259  | 167.2807  | 120.1909 |        123.632         |
|           resnest101e           | 64  | 164.0731 | 187.8801  | 114.479  |         119.87         |
|             dla102              | 128 | 172.2116 |  210.238  | 111.3535 |        112.2408        |
|     swsl_resnext101_32x16d      | 32  | 118.7976 | 140.3802  | 110.1638 |        115.961         |
|         poolformer_m36          | 64  | 145.1757 |  145.602  | 106.4201 |        107.8501        |
|        tnt_s_patch16_224        | 128 | 324.1053 | 324.5449  | 106.3023 |        108.194         |
|        res2net50_14w_8s         | 128 | 140.8465 |  178.358  |  105.54  |        103.958         |
|       gluon_inception_v3        | 128 | 160.6208 | 185.0844  | 103.8473 |        104.8216        |
|          inception_v3           | 128 | 160.3499 | 184.9232  | 103.588  |        104.7801        |
|        adv_inception_v3         | 128 | 160.8506 | 185.6328  | 103.4787 |        104.9566        |
|           convit_base           | 64  | 163.2633 | 163.0495  | 100.0668 |        100.9472        |
|        res2net101_26w_4s        | 64  | 100.4136 |  125.54   | 99.8262  |        89.9747         |
|             dpn107              | 32  | 113.2128 | 130.6532  | 99.3717  |        92.3321         |
|        gluon_xception65         | 32  | 99.0839  | 117.1687  | 92.4517  |        90.9832         |
|           res2next50            | 128 | 126.2288 | 152.8076  | 91.2385  |        92.2105         |
|  swin_base_patch4_window7_224   | 64  | 146.4659 | 151.7146  | 88.0609  |        89.7345         |
|            fbnetv3_b            | 128 | 115.1983 |  142.496  | 86.3401  |         83.168         |
|          mixer_b16_224          | 128 | 116.5221 | 113.9183  | 85.3516  |        85.1634         |
|           dm_nfnet_f0           | 128 | 126.8885 | 127.0369  | 83.3307  |        86.9911         |
|            pit_b_224            | 64  | 118.2765 | 118.6017  | 82.3396  |        82.3155         |
|          convnext_base          | 64  | 122.2974 | 122.3477  | 80.0877  |        81.1495         |
|         visformer_small         | 128 | 91.0459  |  95.9752  | 76.9197  |        77.7862         |
|          gmlp_s16_224           | 128 | 137.3186 |  126.071  | 74.1856  |         74.039         |
|       eca_botnext26ts_256       | 128 | 108.2566 | 146.8409  |  73.947  |        73.9992         |
|          cspdarknet53           | 64  |  94.044  | 111.9449  | 73.8278  |        69.3714         |
|      beit_base_patch16_224      | 64  | 101.1762 | 106.4607  | 73.7275  |        74.7255         |
|            nfnet_l0             | 128 | 111.9515 | 136.6473  | 73.7044  |        76.8952         |
|          botnet26t_256          | 128 | 101.648  | 116.5018  | 71.8396  |        69.5185         |
|            gernet_l             | 128 |  76.952  |  91.1386  | 71.4893  |        67.6139         |
|          jx_nest_base           | 32  | 100.7131 | 100.2614  | 71.1178  |        72.9372         |
|           volo_d1_224           | 64  | 120.5163 | 123.0022  | 70.1544  |        71.6946         |
|      vit_base_patch16_224       | 64  | 86.8355  |  86.9736  | 69.1871  |        69.9326         |
|            repvgg_a2            | 128 | 77.2846  |  95.8461  | 68.6388  |        64.3519         |
|          gmixer_24_224          | 128 | 117.9126 | 131.7933  | 66.9701  |        66.9177         |
| deit_base_distilled_patch16_224 | 64  | 84.5649  |  84.8497  | 66.2663  |        67.1119         |
|      xcit_large_24_p8_224       |  5  | 129.0852 | 139.9584  | 63.4046  |        78.9216         |
|       tf_efficientnet_b0        | 128 | 84.6899  | 119.6021  | 61.8354  |        58.6074         |
|           fbnetc_100            | 128 |  82.776  | 106.5586  |  60.49   |        56.1126         |
|           rexnet_100            | 128 | 79.5532  | 107.9712  | 59.6793  |        56.5119         |
|        twins_pcpvt_base         | 64  | 132.2522 | 128.7992  | 59.5359  |         69.812         |
|            tinynet_a            | 128 | 73.6394  | 102.3804  | 58.4989  |        54.9695         |
|         coat_lite_mini          | 128 | 112.9313 | 113.0922  | 57.5301  |         58.605         |
|           mobilevit_s           | 64  | 84.0287  | 110.4401  | 57.2347  |        55.7259         |
|        sebotnet33ts_256         | 64  | 79.9498  | 100.4251  | 51.9066  |        49.5317         |
|          spnasnet_100           | 128 | 70.5283  |  90.0882  | 51.3043  |        46.6796         |
|          ghostnet_100           | 128 |  90.192  | 117.2249  | 49.5704  |        56.4458         |
|         crossvit_9_240          | 128 |  82.092  | 103.7237  |  49.252  |        49.8918         |
|         mobilenetv2_100         | 128 | 65.3832  |  84.6061  | 46.6745  |        43.0001         |
|        ese_vovnet19b_dw         | 128 | 64.2858  |  73.9612  | 46.0284  |        44.7713         |
|           mnasnet_100           | 128 | 64.3635  |  82.5452  | 44.3753  |        40.8721         |
|           selecsls42b           | 128 | 60.0448  |  73.962   | 42.1312  |        42.3699         |
|          resmlp_12_224          | 128 | 53.2037  |  59.4816  | 42.0332  |        42.0045         |
|      mobilenetv3_large_100      | 128 | 61.3259  |  76.7466  | 41.8356  |        40.1743         |
|           regnety_002           | 128 | 39.0505  |  56.1918  | 28.8691  |         30.032         |
|            lcnet_050            | 128 | 31.7647  |  40.7949  | 19.7758  |        20.6181         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_097_07_04_23_performance_amp_979

Commit hashes

pytorch commit: c68a94c
pytorch commit date: 2023-04-08 02:02:28+00:00
torchbench commit: 90f07fd6cac33a66ab2f8451328ef81b676f4535
torchbench commit date: 2023-04-07 12:10:37-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitc68a94c

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 83%, 50/60 | 93%, 42/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.51x    |    1.62x    |    1.39x    |
| inductor_no_cudagraphs |   1.29x    |    1.53x    |    1.40x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.81    |    7.23     |    5.88     |
|       aot_eager        |    9.42    |    15.85    |    13.12    |
|        inductor        |   59.13    |    68.62    |    99.51    |
| inductor_no_cudagraphs |   64.31    |    60.62    |   111.95    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.95x    |    1.02x    |    1.02x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Previous report name: /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 83%, 50/60  | 83%, 50/60  |
|        inductor        | huggingface | 93%, 42/45  | 93%, 42/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.48x    |   1.51x   |
|        inductor        | huggingface |   1.61x    |   1.62x   |
|        inductor        | timm_models |   1.39x    |   1.39x   |
| inductor_no_cudagraphs | torchbench  |   1.28x    |   1.29x   |
| inductor_no_cudagraphs | huggingface |   1.51x    |   1.53x   |
| inductor_no_cudagraphs | timm_models |   1.40x    |   1.40x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+----------------------------+-----------------+------------------------+
|    suite    |            name            |    inductor     | inductor_no_cudagraphs |
+-------------+----------------------------+-----------------+------------------------+
| torchbench  |       hf_Longformer        |   fail_to_run   |      fail_to_run       |
| torchbench  |            moco            |   fail_to_run   |      fail_to_run       |
| torchbench  |     Background_Matting     | eager_variation |    eager_variation     |
| torchbench  |         tacotron2          |     0.0000      |         0.0000         |
| torchbench  |            gat             |     0.0000      |         0.0000         |
| torchbench  |            gcn             |     0.0000      |         0.0000         |
| torchbench  |           llama            |     0.0000      |         0.0000         |
| torchbench  |            sage            |     0.0000      |         0.0000         |
| torchbench  |       torchrec_dlrm        |     0.0000      |         0.0000         |
| huggingface | AlbertForQuestionAnswering |  fail_accuracy  |     fail_accuracy      |
+-------------+----------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |         lennard_jones         |  1.2335  |         0.8688         |
| torchbench  |             dcgan             |  1.1879  |         0.817          |
| torchbench  |          tts_angular          |  0.9246  |         0.9393         |
| torchbench  |          timm_vovnet          |  0.9212  |         0.9589         |
| torchbench  |              drq              |  0.0077  |         0.9227         |
| torchbench  |       soft_actor_critic       |  0.0063  |         0.7495         |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0836         |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |             dlrm              |   0.0    |         1.2045         |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  0.8975  |         0.8088         |
| huggingface |     DebertaV2ForMaskedLM      |  0.807   |         0.656          |
| huggingface | DebertaV2ForQuestionAnswering |  0.781   |         0.6873         |
| huggingface |      LayoutLMForMaskedLM      |   0.0    |         1.6268         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 176.5747 |        175.8869        |
| torchbench  |           hf_BigBird           | 160.309  |        131.9951        |
| torchbench  |        phlippe_densenet        | 130.3011 |        171.3564        |
| torchbench  |          densenet121           | 124.176  |        137.4438        |
| torchbench  |       timm_efficientnet        | 120.839  |        145.5391        |
| torchbench  |       mobilenet_v3_large       | 114.2023 |        137.1846        |
| torchbench  |          mobilenet_v2          | 105.1712 |        129.6075        |
| torchbench  |             yolov3             | 105.0034 |        120.7003        |
| torchbench  | timm_vision_transformer_large  |   nan    |        128.0136        |
| huggingface |      DebertaV2ForMaskedLM      | 199.6784 |        71.2684         |
| huggingface | DebertaV2ForQuestionAnswering  | 196.5988 |        70.8035         |
| huggingface |     MobileBertForMaskedLM      | 146.0158 |        150.3536        |
| huggingface | MobileBertForQuestionAnswering | 139.519  |        140.5881        |
| huggingface | M2M100ForConditionalGeneration | 136.1343 |        135.3592        |
| huggingface |        XGLMForCausalLM         | 130.0771 |        134.9474        |
| huggingface |  MT5ForConditionalGeneration   | 124.4349 |        132.2643        |
| timm_models |           hrnet_w18            | 236.3629 |        253.9841        |
| timm_models |           rexnet_100           | 224.7499 |        297.0445        |
| timm_models |          ghostnet_100          | 192.2707 |        243.5687        |
| timm_models |         pnasnet5large          | 157.8322 |        166.9827        |
| timm_models |          resnest101e           | 153.3786 |        171.5376        |
| timm_models |           fbnetv3_b            | 145.7091 |        174.1522        |
| timm_models |        twins_pcpvt_base        | 142.3674 |        151.6112        |
| timm_models |       res2net101_26w_4s        | 141.6476 |        154.5319        |
| timm_models |          mobilevit_s           | 141.3446 |        159.2207        |
| timm_models |          tf_mixnet_l           | 141.0283 |        165.4847        |
| timm_models |            mixnet_l            | 137.9874 |        163.0108        |
| timm_models |        adv_inception_v3        | 136.2104 |        160.4157        |
| timm_models |       gluon_inception_v3       | 135.6235 |        160.3407        |
| timm_models |      xcit_large_24_p8_224      | 134.7515 |        139.6318        |
| timm_models |          inception_v3          | 133.6956 |        165.7827        |
| timm_models |           tinynet_a            | 133.3159 |        157.8647        |
| timm_models |     mobilenetv3_large_100      | 132.1494 |        160.0576        |
| timm_models |       tf_efficientnet_b0       | 131.2147 |        159.8844        |
| timm_models |        res2net50_14w_8s        | 120.6187 |        127.9824        |
| timm_models |          spnasnet_100          | 116.4227 |        137.644         |
| timm_models |           fbnetc_100           | 113.787  |        140.9721        |
| timm_models |        mobilenetv2_100         | 106.1095 |        131.7379        |
| timm_models |          mnasnet_100           | 104.9756 |        127.7246        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |         nvidia_deeprecommender          |  0.9195  |         0.8931         |
| torchbench  |             pytorch_stargan             |  0.8935  |         0.8893         |
| torchbench  |                resnet50                 |  0.8895  |         0.8839         |
| torchbench  |               timm_vovnet               |  0.889   |         0.8869         |
| torchbench  |         timm_vision_transformer         |  0.8873  |         0.8835         |
| torchbench  |            phlippe_densenet             |  0.8834  |         0.8659         |
| torchbench  |           speech_transformer            |  0.8694  |         0.869          |
| torchbench  |               densenet121               |  0.8167  |         0.7961         |
| torchbench  |               hf_Reformer               |  0.8132  |         0.8022         |
| torchbench  |               mnasnet1_0                |  0.7837  |         0.778          |
| torchbench  |           mobilenet_v3_large            |  0.782   |         0.8077         |
| torchbench  |             resnext50_32x4d             |  0.7778  |         0.7712         |
| torchbench  |             LearningToPaint             |  0.7552  |         0.7463         |
| torchbench  |             pytorch_struct              |  0.7428  |         0.7362         |
| torchbench  |                resnet18                 |  0.619   |         0.6097         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.6035  |         0.6004         |
| torchbench  |          functorch_dp_cifar10           |  0.451   |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3554  |         0.3395         |
| huggingface |          DistilBertForMaskedLM          |  0.8872  |         0.9624         |
| huggingface |            TrOCRForCausalLM             |  0.8855  |         0.9583         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8749  |         0.9803         |
| huggingface |     MobileBertForQuestionAnswering      |  0.8399  |         0.8392         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8215  |         0.9119         |
| huggingface |         Speech2Text2ForCausalLM         |  0.7921  |         0.8779         |
| timm_models |               regnety_002               |  0.9009  |         0.8966         |
| timm_models |                lcnet_050                |  0.8898  |         0.884          |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/passrate_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/comp_time_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

No regressions found.

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Compilation latency (sec) regressions

+------------------------+--------------------------------+-------------+------------+
|        compiler        |              name              | prev_status | cur_status |
+------------------------+--------------------------------+-------------+------------+
|        inductor        | M2M100ForConditionalGeneration |  105.3658   |  136.1343  |
|        inductor        |        XGLMForCausalLM         |   99.9815   |  130.0771  |
|        inductor        |  MT5ForConditionalGeneration   |   90.7427   |  124.4349  |
| inductor_no_cudagraphs | M2M100ForConditionalGeneration |  103.3911   |  135.3592  |
| inductor_no_cudagraphs |        XGLMForCausalLM         |   99.6257   |  134.9474  |
| inductor_no_cudagraphs |  MT5ForConditionalGeneration   |   91.7148   |  132.2643  |
+------------------------+--------------------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|           BERT_pytorch            |  16  | 1.0054 |  0.8123   |  3.4005  |         2.1347         |
|       functorch_dp_cifar10        |  64  | 0.9657 |  0.9175   |  3.3613  |         1.373          |
|            hf_BigBird             |  2   | 0.9579 |  0.7712   |  2.8804  |         1.7025         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9724 |  0.9073   |  2.6747  |         1.7778         |
|            hf_T5_large            |  2   | 1.0084 |  0.8296   |  2.5502  |         1.9734         |
|             hf_Albert             |  8   | 0.9964 |  0.9595   |  2.3571  |         2.297          |
|              hf_Bart              |  4   | 0.9918 |  0.8463   |  2.1585  |         1.571          |
|              hf_Bert              |  4   | 1.0298 |  0.8571   |  2.1064  |         1.6803         |
|           squeezenet1_1           |  32  | 0.9857 |  0.9213   |  1.9742  |         1.216          |
|               hf_T5               |  8   | 0.9955 |  0.8614   |  1.9714  |         2.0271         |
|              hf_GPT2              |  4   | 1.0208 |  0.9881   |  1.9245  |         1.893          |
|            densenet121            |  4   | 0.9942 |  0.7077   |  1.8996  |         1.0688         |
|           hf_GPT2_large           |  4   | 1.0002 |  0.9888   |  1.8025  |         1.7929         |
|           hf_Bert_large           |  4   | 1.035  |  0.8833   |  1.6514  |         1.6343         |
|        mobilenet_v3_large         |  32  | 1.0042 |  0.7883   |  1.6502  |         1.2078         |
|         phlippe_densenet          | 128  | 0.9992 |  0.7791   |  1.6159  |         1.0366         |
|           timm_resnest            |  32  | 0.9972 |  0.8518   |  1.5921  |         1.5154         |
|            timm_nfnet             | 128  | 1.0002 |  0.9988   |  1.5906  |         1.5048         |
|      timm_vision_transformer      |  32  | 0.9916 |  0.8661   |  1.5896  |         1.3863         |
| attention_is_all_you_need_pytorch | 256  |  1.0   |  0.9146   |  1.5647  |         1.5216         |
|           mobilenet_v2            |  96  | 0.9989 |  0.7788   |  1.5389  |         1.5301         |
|           fastNLP_Bert            |  6   | 1.0012 |  0.8317   |  1.5355  |         1.6055         |
|          phlippe_resnet           | 128  | 0.9878 |  0.7579   |  1.5323  |         1.0124         |
|           hf_DistilBert           |  8   | 0.9918 |  0.9589   |  1.5238  |         1.502          |
|        speech_transformer         |  32  | 0.9856 |   0.837   |  1.4698  |         1.6337         |
|        shufflenet_v2_x1_0         | 128  | 0.9969 |  0.7591   |  1.4422  |         1.2097         |
|          pytorch_struct           | 200  | 0.9157 |  0.7578   |  1.409   |         1.1143         |
|           pytorch_unet            |  1   | 0.9985 |  0.2051   |  1.3695  |         1.3565         |
|          resnext50_32x4d          |  8   | 0.9848 |  0.7191   |  1.3551  |         0.9995         |
|             resnet18              |  16  | 0.9917 |  0.7536   |  1.3192  |         0.9866         |
|            mnasnet1_0             |  32  | 0.9953 |  0.7371   |  1.3131  |         1.1071         |
|          pytorch_stargan          |  16  | 0.9945 |  0.8083   |  1.2778  |         1.2616         |
|               vgg16               |  64  | 0.9995 |  0.9983   |  1.2614  |         1.2533         |
|          LearningToPaint          |  96  | 0.9886 |  0.7758   |  1.2576  |         1.0677         |
|            Super_SloMo            |  6   | 0.9985 |   0.179   |  1.2337  |         1.2359         |
|           lennard_jones           | 1000 | 0.8284 |  0.7703   |  1.2335  |         0.8688         |
|        Background_Matting         |  4   | 0.9994 |  0.1369   |  1.2194  |         1.2094         |
|              yolov3               |  16  | 0.9996 |  0.8086   |  1.2152  |         1.2059         |
|         timm_efficientnet         |  32  | 0.9456 |  0.6286   |  1.1988  |         1.1003         |
|               dcgan               |  32  | 0.8538 |  0.6938   |  1.1879  |         0.817          |
|              alexnet              | 128  | 0.9989 |  0.9976   |  1.1402  |         1.1355         |
|             resnet50              |  32  | 0.9984 |  0.7743   |  1.127   |         1.0646         |
|            hf_Reformer            |  4   | 0.9836 |  0.9641   |  1.1214  |         1.0684         |
|              demucs               |  4   | 0.9988 |  1.0023   |  1.0526  |         1.0385         |
|             resnet152             |  32  | 0.9985 |  0.7559   |  1.0414  |         1.0282         |
|            timm_regnet            |  32  | 0.9298 |  0.7871   |  0.9915  |         0.9806         |
|      nvidia_deeprecommender       | 256  | 0.9986 |  0.9981   |  0.9793  |         1.0189         |
|            tts_angular            |  64  | 0.9172 |  0.8753   |  0.9246  |         0.9393         |
|            timm_vovnet            |  32  | 0.8738 |  0.7283   |  0.9212  |         0.9589         |
|                drq                |  1   | 0.9573 |  0.7378   |  0.0077  |         0.9227         |
|         soft_actor_critic         | 256  | 0.8472 |  0.6183   |  0.0063  |         0.7495         |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|   timm_vision_transformer_large   |  32  | 0.9999 |    0.0    |   0.0    |         1.0836         |
|               moco                |  32  | 0.9798 |    0.0    |   0.0    |          0.0           |
|           hf_Longformer           |  2   | 1.0193 |  0.6937   |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9322 |  0.8491   |   0.0    |         1.2045         |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|          vision_maskrcnn          |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 26.8212 |  55.4662  | 176.5747 |        175.8869        |
|            hf_BigBird             |  2   | 12.858  |  39.9262  | 160.309  |        131.9951        |
|         phlippe_densenet          | 128  | 3.2732  |  7.0529   | 130.3011 |        171.3564        |
|            densenet121            |  4   | 7.7208  |  18.1956  | 124.176  |        137.4438        |
|         timm_efficientnet         |  32  | 4.9957  |  10.3044  | 120.839  |        145.5391        |
|        mobilenet_v3_large         |  32  | 3.5042  |  7.7829   | 114.2023 |        137.1846        |
|           hf_GPT2_large           |  4   | 14.6834 |  30.4096  | 108.5621 |        107.2968        |
|           mobilenet_v2            |  96  | 3.2021  |  6.9737   | 105.1712 |        129.6075        |
|              yolov3               |  16  | 5.1766  |  10.9295  | 105.0034 |        120.7003        |
|             resnet152             |  32  | 9.0675  |  20.5254  | 100.4668 |        107.029         |
|            mnasnet1_0             |  32  | 3.2143  |  6.8662   | 86.2867  |        109.7718        |
|           timm_resnest            |  32  | 1.8222  |  3.9857   | 85.7847  |        101.3114        |
|        speech_transformer         |  32  |  5.907  |  13.8794  | 75.4863  |        78.7126         |
|        shufflenet_v2_x1_0         | 128  | 3.5413  |  7.8761   | 74.4731  |        83.3441         |
| attention_is_all_you_need_pytorch | 256  | 4.4263  |  10.9697  | 73.4148  |        74.1981         |
|            timm_regnet            |  32  | 6.6853  |  12.459   | 70.2481  |        74.0943         |
|           BERT_pytorch            |  16  | 4.8732  |  11.7746  | 68.5879  |        69.6011         |
|            timm_nfnet             | 128  | 5.7894  |  11.3698  | 67.0214  |        73.5904         |
|           hf_Bert_large           |  4   | 10.4551 |  21.2223  | 64.7983  |        65.3834         |
|        Background_Matting         |  4   | 3.0261  |  11.3087  | 64.7594  |        71.6846         |
|            timm_vovnet            |  32  | 3.5952  |  6.4407   | 56.7911  |        63.8014         |
|             resnet50              |  32  | 3.2833  |  7.1386   | 56.0764  |        65.0696         |
|               hf_T5               |  8   | 5.5143  |  13.1087  | 53.5331  |        50.0522         |
|      timm_vision_transformer      |  32  |  3.315  |  7.4018   | 50.7918  |        53.0156         |
|              hf_Bart              |  4   | 6.2115  |  13.7529  | 49.6603  |        49.9313         |
|           fastNLP_Bert            |  6   | 5.0769  |  11.4267  |  49.628  |        48.5294         |
|           pytorch_unet            |  1   |  1.559  |  4.4396   | 49.1871  |        61.9058         |
|          resnext50_32x4d          |  8   | 3.2611  |  7.2213   | 47.4425  |        53.8376         |
|            hf_Reformer            |  4   | 4.2203  |  6.0795   | 46.7538  |         44.588         |
|            Super_SloMo            |  6   | 2.7702  |  10.0199  | 44.4575  |        43.1173         |
|       functorch_dp_cifar10        |  64  | 1.2063  |  2.4631   | 43.6932  |        56.1942         |
|             hf_Albert             |  8   | 2.5379  |  8.2362   | 42.7214  |        41.8188         |
|              hf_GPT2              |  4   | 4.7077  |  9.8209   | 41.4109  |        41.6323         |
|          LearningToPaint          |  96  | 1.4062  |  2.9647   |  40.421  |        45.1965         |
|              hf_Bert              |  4   | 5.1176  |  10.6732  | 39.0251  |        38.4563         |
|          pytorch_stargan          |  16  | 1.2238  |  3.2914   | 38.7244  |        46.9521         |
|             resnet18              |  16  | 1.3649  |  2.8179   | 36.7031  |        44.4776         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2351  |  3.0141   | 35.3141  |        37.0984         |
|           hf_DistilBert           |  8   | 2.4254  |  5.3713   | 33.6837  |        31.0424         |
|              demucs               |  4   | 1.4536  |  2.2131   | 31.2874  |        32.0776         |
|          phlippe_resnet           | 128  |  1.363  |  2.8647   | 28.1963  |        34.0985         |
|           squeezenet1_1           |  32  |  1.044  |   1.802   | 23.2453  |         23.467         |
|          pytorch_struct           | 200  | 0.7704  |  1.3503   | 20.9149  |         22.112         |
|               vgg16               |  64  | 0.6382  |  1.1184   | 16.1379  |         17.087         |
|              alexnet              | 128  | 0.4997  |  0.7922   | 15.6642  |        15.8967         |
|                drq                |  1   | 0.6803  |  1.0245   | 12.7955  |        11.8103         |
|      nvidia_deeprecommender       | 256  | 0.4928  |   0.754   |  9.8964  |        10.9635         |
|               dcgan               |  32  | 0.4358  |   0.731   |  9.6472  |         9.0178         |
|         soft_actor_critic         | 256  | 0.4342  |  0.6133   |  7.6583  |         8.2303         |
|            tts_angular            |  64  | 0.4467  |  0.5196   |  7.1175  |         6.9337         |
|           lennard_jones           | 1000 | 0.3979  |  0.6018   |  6.2621  |         6.0108         |
|   timm_vision_transformer_large   |  32  |  9.35   |    nan    |   nan    |        128.0136        |
|               dlrm                | 1024 | 0.3853  |   0.793   |   nan    |         7.6384         |
|           hf_Longformer           |  2   | 9.5532  |  30.9318  |   nan    |          nan           |
|               moco                |  32  | 33.7274 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.265   |         1.2557         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  1.193   |         1.1717         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.1751  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.1727  |         1.1719         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  1.1687  |         1.168          |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  1.1425  |         1.116          |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  1.1361  |         1.1266         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  1.1334  |         1.128          |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  1.1141  |         1.0713         |
|           mobilenet_v2            |  96  | 0.986  |  0.7645   |  1.1066  |         1.1025         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  1.1053  |         0.9973         |
|            timm_nfnet             | 128  | 0.9072 |  0.8753   |  1.0764  |         1.0712         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  1.0737  |         1.0725         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  1.0687  |         0.9997         |
|                drq                |  1   | 0.9877 |  0.8852   |  1.0607  |         0.9573         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  1.0427  |         1.0403         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  1.0344  |         1.0258         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  1.0292  |         0.9945         |
|         timm_efficientnet         |  32  | 0.9874 |  0.7667   |  1.0122  |         0.9436         |
|              yolov3               |  16  | 0.9879 |  0.8253   |  1.012   |         1.0372         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9952  |         0.9983         |
|              demucs               |  4   | 0.9661 |  0.9659   |  0.9866  |         0.9656         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.9823  |         0.9808         |
|              hf_Bart              |  4   | 0.9087 |  0.7524   |  0.978   |         0.9173         |
|           timm_resnest            |  32  | 0.9887 |  0.8837   |  0.9725  |         0.9665         |
|        shufflenet_v2_x1_0         | 128  | 0.9551 |  0.8396   |  0.9706  |         0.9658         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.9644  |         0.9645         |
|            timm_regnet            |  32  | 0.9908 |  0.8517   |  0.9543  |         0.9533         |
|             resnet152             |  32  | 0.9952 |  0.8941   |  0.9449  |         0.9414         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.9434  |         0.939          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.9306  |         0.9308         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.9195  |         0.8931         |
|           squeezenet1_1           |  32  | 0.9673 |  0.9321   |   0.91   |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.8935  |         0.8893         |
|             resnet50              |  32  | 0.9921 |  0.8604   |  0.8895  |         0.8839         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.889   |         0.8869         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8873  |         0.8835         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8834  |         0.8659         |
|        speech_transformer         |  32  | 0.9914 |    0.9    |  0.8694  |         0.869          |
|            densenet121            |  4   | 0.9944 |  0.9824   |  0.8167  |         0.7961         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.8132  |         0.8022         |
|            mnasnet1_0             |  32  | 0.9792 |  0.8656   |  0.7837  |         0.778          |
|        mobilenet_v3_large         |  32  | 0.978  |   0.839   |  0.782   |         0.8077         |
|          resnext50_32x4d          |  8   | 0.9942 |  0.8425   |  0.7778  |         0.7712         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.7552  |         0.7463         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7428  |         0.7362         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.619   |         0.6097         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8796   |  0.6035  |         0.6004         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.451   |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3554  |         0.3395         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |         1.0009         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 1.0057 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|                drq                |  1   |  3.4833  |  4.5514   | 743.3187 |         4.3008         |
|         soft_actor_critic         | 256  |  1.7977  |  2.7934   | 388.312  |         3.1309         |
|           hf_GPT2_large           |  4   | 209.0753 | 211.0152  | 115.8791 |        116.4837        |
|        Background_Matting         |  4   | 125.6878 |  919.163  | 103.1318 |        104.0068        |
|               hf_T5               |  8   | 180.0546 | 210.1547  | 91.8774  |        88.5663         |
|            hf_T5_large            |  2   | 220.7143 | 268.5398  | 88.8566  |        113.2571        |
|            hf_BigBird             |  2   | 202.325  |  279.724  | 76.0508  |        114.1131        |
|            timm_nfnet             | 128  | 118.3591 | 118.0793  | 74.5331  |        78.9065         |
|            hf_Reformer            |  4   | 82.3039  |  83.9461  |  72.176  |         75.865         |
|            Super_SloMo            |  6   | 79.5915  | 443.7574  | 64.2861  |        64.2773         |
|             resnet152             |  32  | 63.9645  |  84.6717  | 60.9268  |        62.4668         |
|              yolov3               |  16  | 68.6768  |  84.6849  | 56.2131  |        56.7553         |
|            timm_regnet            |  32  | 60.2897  |  70.4903  | 55.6255  |        57.1144         |
|               vgg16               |  64  | 66.2026  |  66.2659  | 52.5295  |        52.8261         |
|              demucs               |  4   | 53.5925  |  53.7117  | 50.5672  |        51.4447         |
|           hf_Bert_large           |  4   | 80.8856  |  92.5829  | 49.8767  |        50.5824         |
|        speech_transformer         |  32  | 67.3217  |  75.8431  | 39.4015  |        37.9375         |
| attention_is_all_you_need_pytorch | 256  | 55.1494  |  60.0157  | 34.6848  |        35.5762         |
|           fastNLP_Bert            |  6   | 51.8074  |  72.1331  | 33.5453  |        35.2753         |
|              hf_Bart              |  4   | 60.8714  |  70.9383  | 33.1568  |        44.5167         |
|           mobilenet_v2            |  96  | 47.0137  |  60.2186  |  30.445  |        30.7229         |
|             hf_Albert             |  8   | 69.7652  |  72.2363  | 29.4444  |         29.779         |
|           pytorch_unet            |  1   | 39.9178  | 194.0749  | 29.0266  |        29.3724         |
|            densenet121            |  4   |  55.509  |  84.344   | 28.9176  |        49.9906         |
|         timm_efficientnet         |  32  | 33.9393  |  51.335   | 26.8354  |        29.2751         |
|            timm_vovnet            |  32  | 28.1975  |  34.0783  |  26.782  |        26.0463         |
|              hf_GPT2              |  4   | 48.6051  |  50.3839  | 25.2926  |        25.7683         |
|             resnet50              |  32  | 26.8001  |  34.3887  |  23.866  |        24.9063         |
|              hf_Bert              |  4   | 40.1209  |  47.368   | 21.5698  |        26.7785         |
|           hf_DistilBert           |  8   | 32.2893  |  33.3039  | 20.9339  |        20.9009         |
|        shufflenet_v2_x1_0         | 128  |  30.974  |  40.7008  | 20.9073  |         25.717         |
|            mnasnet1_0             |  32  | 22.5937  |  30.5076  | 17.9521  |        19.6666         |
|      timm_vision_transformer      |  32  | 32.7662  |  33.9153  | 17.6372  |         20.068         |
|          resnext50_32x4d          |  8   | 20.9175  |  28.7412  | 16.4726  |        22.2345         |
|           BERT_pytorch            |  16  | 52.2344  |  68.0921  | 16.1084  |        24.6531         |
|        mobilenet_v3_large         |  32  |  27.21   |  34.368   | 16.1011  |        21.8536         |
|           timm_resnest            |  32  | 24.2816  |  28.2812  | 15.0446  |        15.8468         |
|         phlippe_densenet          | 128  | 23.3972  |  29.7913  | 14.6448  |        22.4623         |
|          pytorch_stargan          |  16  | 15.0578  |  18.5564  | 11.4878  |        11.7733         |
|      nvidia_deeprecommender       | 256  | 10.2272  |  10.2303  | 10.4243  |        10.0206         |
|          LearningToPaint          |  96  | 11.1705  |  14.8099  |  9.034   |        10.6915         |
|              alexnet              | 128  |  9.8343  |  9.8348   |  8.6076  |         8.6504         |
|             resnet18              |  16  |  9.3569  |  14.2711  |  7.4076  |         9.4042         |
|            tts_angular            |  64  |  6.8261  |  7.0958   |  6.7964  |         6.6251         |
|          phlippe_resnet           | 128  |  9.0863  |  11.8339  |  6.0521  |         8.9226         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 14.5504  |  16.6487  |  5.6523  |         7.785          |
|           squeezenet1_1           |  32  | 11.0244  |  12.8779  |  5.3284  |         9.6931         |
|          pytorch_struct           | 200  |  5.0906  |  6.0676   |  3.4095  |         4.1772         |
|       functorch_dp_cifar10        |  64  | 10.3489  |  11.2955  |  3.1331  |         7.531          |
|               dcgan               |  32  |  2.4077  |  3.0738   |  1.7768  |         2.5228         |
|           lennard_jones           | 1000 |  1.8287  |   2.16    |  1.2992  |         1.7657         |
|   timm_vision_transformer_large   |  32  | 463.3618 |    nan    |   nan    |        427.6538        |
|               dlrm                | 1024 |  4.4484  |  4.9332   |   nan    |         3.8924         |
|           hf_Longformer           |  2   | 111.885  | 163.6661  |   nan    |          nan           |
|               moco                |  32  | 51.7919  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 1.0148 |  0.8607   |  2.9079  |         1.2122         |
|             OPTForCausalLM              |  2  | 0.992  |  0.9265   |  2.5138  |         2.5127         |
|     MobileBertForQuestionAnswering      | 128 | 1.0159 |  0.8549   |  2.5098  |         1.1518         |
|      GPT2ForSequenceClassification      |  4  | 0.9884 |  0.9631   |  2.3728  |         2.3551         |
|       MT5ForConditionalGeneration       | 16  | 1.0158 |  0.8514   |  2.3555  |         2.007          |
|             XGLMForCausalLM             |  8  | 1.0037 |  0.8693   |  2.2872  |         1.5528         |
|       ElectraForQuestionAnswering       | 64  | 0.9988 |  0.9874   |  2.1523  |         2.1401         |
|           ElectraForCausalLM            | 32  | 0.996  |  0.9496   |  1.9053  |         1.8709         |
|    LayoutLMForSequenceClassification    | 16  | 0.997  |   0.983   |  1.8665  |         1.8309         |
|            XLNetLMHeadModel             |  8  | 0.999  |  0.9699   |  1.8168  |         1.8234         |
|        BertForQuestionAnswering         | 16  | 0.9972 |   0.982   |  1.8048  |         1.8024         |
|       RobertaForQuestionAnswering       | 16  | 0.9973 |  0.9823   |  1.8034  |         1.8102         |
|    MegatronBertForQuestionAnswering     |  8  | 0.998  |  0.9778   |  1.7012  |         1.6775         |
|           RobertaForCausalLM            | 16  | 0.9975 |  0.9732   |  1.6896  |         1.687          |
|      MBartForConditionalGeneration      |  2  | 1.0058 |  0.9726   |  1.6871  |         1.4417         |
|                 T5Small                 |  4  | 0.9948 |  0.8575   |  1.6848  |         1.7699         |
|       T5ForConditionalGeneration        |  4  | 0.9912 |  0.8603   |  1.6836  |         1.7714         |
|               DistillGPT2               | 16  | 0.9929 |  0.9606   |  1.6824  |         1.7184         |
|            PLBartForCausalLM            |  8  | 0.9883 |  0.9609   |  1.6659  |         1.6898         |
|     PLBartForConditionalGeneration      |  4  | 0.9923 |  0.9504   |  1.6499  |         1.654          |
|       AlbertForQuestionAnswering        |  4  | 1.0001 |  0.8859   |  1.6452  |         1.6453         |
|            AlbertForMaskedLM            |  4  | 1.0002 |  0.8852   |  1.6368  |         1.6374         |
|             BertForMaskedLM             | 16  | 0.9976 |  0.9722   |  1.6181  |         1.6159         |
|     M2M100ForConditionalGeneration      | 16  | 0.9944 |  0.8327   |  1.6169  |         1.515          |
|         MegatronBertForCausalLM         |  4  | 1.0182 |   0.924   |  1.6136  |         1.5731         |
|         Speech2Text2ForCausalLM         | 256 | 0.9836 |  0.9198   |  1.5683  |         1.5588         |
|                CamemBert                | 16  | 0.9981 |  0.9736   |  1.5641  |         1.5689         |
|             BartForCausalLM             |  4  | 0.9897 |  0.9547   |  1.5557  |         1.5595         |
|            MBartForCausalLM             |  4  | 0.9875 |  0.9586   |  1.5446  |         1.5546         |
|      BartForConditionalGeneration       |  2  | 1.0123 |   0.972   |  1.5375  |         1.4767         |
|            YituTechConvBert             | 16  | 0.9979 |  0.9703   |  1.527   |         1.5201         |
|     DistilBertForQuestionAnswering      | 256 | 0.9971 |  0.9909   |  1.4646  |         1.4588         |
|          BlenderbotForCausalLM          |  4  | 0.993  |  0.8503   |  1.4584  |         1.2833         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0034 |  0.9245   |  1.4279  |         1.4506         |
|     PegasusForConditionalGeneration     | 32  | 1.0061 |  0.9505   |  1.3998  |         1.2957         |
|            TrOCRForCausalLM             | 32  | 0.9883 |   0.957   |  1.2781  |         1.2963         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9875 |  0.9182   |  1.2537  |         1.301          |
|           PegasusForCausalLM            | 32  | 0.9845 |  0.9335   |  1.2324  |         1.2191         |
|          DistilBertForMaskedLM          | 128 | 0.9962 |  0.9551   |  1.2225  |         1.2466         |
|       DebertaForQuestionAnswering       |  8  | 0.806  |  0.7015   |  0.9684  |         0.9614         |
|           DebertaForMaskedLM            |  4  | 0.7128 |  0.5569   |  0.8975  |         0.8088         |
|          DebertaV2ForMaskedLM           |  1  | 0.6814 |  0.5196   |  0.807   |         0.656          |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6877 |  0.5254   |  0.781   |         0.6873         |
|           LayoutLMForMaskedLM           | 16  | 0.9976 |  0.9729   |   0.0    |         1.6268         |
|          AllenaiLongformerBase          |  4  | 1.0101 |  0.6699   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          DebertaV2ForMaskedLM           |  1  | 15.6383 |  27.4867  | 199.6784 |        71.2684         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.2727 |  26.8695  | 196.5988 |        70.8035         |
|          MobileBertForMaskedLM          | 64  | 17.2441 |  40.8278  | 146.0158 |        150.3536        |
|     MobileBertForQuestionAnswering      | 128 | 17.1745 |  40.4764  | 139.519  |        140.5881        |
|     M2M100ForConditionalGeneration      | 16  | 11.9595 |  26.9061  | 136.1343 |        135.3592        |
|             XGLMForCausalLM             |  8  |  9.512  |  21.4179  | 130.0771 |        134.9474        |
|       MT5ForConditionalGeneration       | 16  | 8.1196  |  18.3354  | 124.4349 |        132.2643        |
|       DebertaForQuestionAnswering       |  8  | 7.2039  |  13.5406  | 106.4678 |        59.2205         |
|           DebertaForMaskedLM            |  4  | 7.6198  |  13.9135  | 100.7004 |        57.3364         |
|            XLNetLMHeadModel             |  8  | 10.4589 |  27.6916  | 95.4375  |        96.5673         |
|      MBartForConditionalGeneration      |  2  | 11.8292 |  26.3064  | 79.7942  |        79.7296         |
|      BartForConditionalGeneration       |  2  | 11.6444 |  26.0207  | 76.8443  |        74.6091         |
|          BlenderbotForCausalLM          |  4  | 10.9955 |  22.0374  | 69.7686  |        68.5522         |
|    MegatronBertForQuestionAnswering     |  8  | 10.2938 |  21.276   | 68.9551  |        69.6586         |
|            YituTechConvBert             | 16  | 7.1403  |  16.0512  | 68.9087  |        69.7321         |
|     PegasusForConditionalGeneration     | 32  | 5.4562  |  19.7567  | 68.3887  |        68.8198         |
|         MegatronBertForCausalLM         |  4  | 10.3492 |  21.4513  | 66.8345  |         67.019         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7868  |  17.1417  | 56.4593  |        55.3481         |
|           ElectraForCausalLM            | 32  | 5.2344  |  11.5778  | 53.3831  |        54.8023         |
|                 T5Small                 |  4  | 5.7018  |  12.6793  | 50.6041  |        52.2103         |
|       T5ForConditionalGeneration        |  4  | 5.6049  |  12.7469  | 50.5129  |         51.331         |
|     PLBartForConditionalGeneration      |  4  | 6.2358  |  13.5031  | 49.9633  |        49.7986         |
|    LayoutLMForSequenceClassification    | 16  | 5.5929  |  11.0365  | 47.1726  |        47.9426         |
|       ElectraForQuestionAnswering       | 64  |  5.333  |  10.7314  |  45.855  |        43.4644         |
|             BertForMaskedLM             | 16  | 5.2263  |  10.8976  | 41.3306  |        42.0141         |
|        BertForQuestionAnswering         | 16  | 5.2056  |   11.0    | 40.7027  |        41.7496         |
|             BartForCausalLM             |  4  | 5.7007  |  11.1375  | 40.5732  |        40.4248         |
|            MBartForCausalLM             |  4  | 5.6962  |  11.259   | 39.9147  |         42.332         |
|           RobertaForCausalLM            | 16  | 5.2739  |  10.9009  | 39.3472  |        38.4412         |
|             OPTForCausalLM              |  2  | 4.6451  |  10.3213  | 38.9754  |         38.706         |
|                CamemBert                | 16  | 5.2055  |  10.9714  | 38.8617  |        40.4578         |
|       RobertaForQuestionAnswering       | 16  | 5.3135  |  10.9573  | 38.6672  |        37.2281         |
|            TrOCRForCausalLM             | 32  | 5.5994  |  11.106   | 38.5013  |        38.5168         |
|            AlbertForMaskedLM            |  4  | 2.4019  |  8.2165   | 38.2584  |        37.2664         |
|           PegasusForCausalLM            | 32  |  5.762  |  11.041   | 37.6654  |        38.7761         |
|      GPT2ForSequenceClassification      |  4  | 4.7335  |  9.8754   | 36.9801  |        38.3153         |
|     DistilBertForQuestionAnswering      | 256 | 2.5151  |  5.3201   | 36.2299  |        38.6179         |
|       AlbertForQuestionAnswering        |  4  | 2.2252  |  8.1956   | 35.3426  |        34.4763         |
|          DistilBertForMaskedLM          | 128 | 2.5283  |  5.3778   | 35.2422  |        37.5458         |
|       BlenderbotSmallForCausalLM        | 64  | 3.8316  |   7.608   | 29.9415  |        30.2407         |
|               DistillGPT2               | 16  | 2.5797  |  5.0994   | 29.6195  |        27.7837         |
|            PLBartForCausalLM            |  8  | 3.0404  |  5.9026   | 27.2268  |        27.0899         |
|         Speech2Text2ForCausalLM         | 256 | 3.0606  |  5.6754   | 25.4074  |        26.7592         |
|           LayoutLMForMaskedLM           | 16  | 5.7261  |  11.4536  |   nan    |        42.7497         |
|          AllenaiLongformerBase          |  4  | 9.7842  |  31.2758  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  1.3156  |         1.3147         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  1.2697  |         1.268          |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0537   |  1.2117  |         1.1526         |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1962  |         1.195          |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1958  |         1.2307         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.1782  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.1778  |         1.1724         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.1509  |         1.1479         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.1426  |         1.1368         |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.1261  |         1.1813         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.1261  |         1.1813         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  1.1159  |         1.1152         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.0965  |         1.1346         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  1.0827  |         1.0962         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0562  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.056   |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0532  |         1.0491         |
|            YituTechConvBert             | 16  | 0.9999 |  0.9143   |  1.043   |         1.0411         |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9156   |  1.0319  |         0.9978         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  1.0074  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  1.004   |         1.0307         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9763   |  1.0015  |         0.9797         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |  1.0003  |         0.999          |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.9886  |         0.9665         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.988   |         1.0139         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9772  |         1.052          |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9753  |         0.9739         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.971   |         1.0642         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.9505  |         1.016          |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9444  |         0.9912         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.9321  |         0.9908         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9294  |         0.9749         |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.9264  |         0.9792         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9162  |         0.9886         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.9161  |         0.9864         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9127  |         1.0018         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8872  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8855  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8749  |         0.9803         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.8399  |         0.8392         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8215  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.7921  |         0.8779         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |   nan    |         1.0518         |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8684   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 265.9217 | 300.3129  | 162.7343 |        162.6444        |
|       AlbertForQuestionAnswering        |  4  | 263.8804 | 297.6868  | 160.7392 |        160.6136        |
|            XLNetLMHeadModel             |  8  |  280.35  | 289.1877  | 153.4174 |        154.2526        |
|      DebertaV2ForQuestionAnswering      |  2  | 152.9638 | 197.7468  | 137.4759 |        172.104         |
|          DebertaV2ForMaskedLM           |  1  | 150.7454 | 196.1414  | 129.6117 |        157.7547        |
|     PegasusForConditionalGeneration     | 32  | 140.2171 | 148.1805  | 108.1662 |        107.3697        |
|            TrOCRForCausalLM             | 32  | 139.4065 | 143.2927  | 107.8589 |        106.1451        |
|      MBartForConditionalGeneration      |  2  | 138.4779 | 142.5897  | 92.8848  |        99.6146         |
|      BartForConditionalGeneration       |  2  | 148.048  | 142.3146  | 90.7962  |        97.8224         |
|    MegatronBertForQuestionAnswering     |  8  | 142.0007 | 144.6574  | 83.3137  |        84.7217         |
|            YituTechConvBert             | 16  |  125.46  | 129.0986  | 82.0903  |        82.3773         |
|          BlenderbotForCausalLM          |  4  | 110.1643 |  127.803  | 81.2978  |        91.6516         |
|       DebertaForQuestionAnswering       |  8  | 93.8808  |  108.027  | 79.8773  |        78.6455         |
| BlenderbotSmallForConditionalGeneration | 64  | 110.5931 | 121.5457  | 79.1103  |         84.597         |
|                CamemBert                | 16  | 118.5029 | 121.4551  | 75.7308  |        76.1598         |
|            MBartForCausalLM             |  4  | 114.916  | 118.2548  |  73.848  |        72.9942         |
|             BartForCausalLM             |  4  | 114.616  | 119.5582  | 72.9373  |        72.6756         |
|     M2M100ForConditionalGeneration      | 16  | 118.1268 | 140.9205  | 71.3789  |        97.2577         |
|     DistilBertForQuestionAnswering      | 256 | 103.5302 |  104.262  | 70.7567  |        71.4131         |
|     PLBartForConditionalGeneration      |  4  | 118.7811 | 122.2317  | 70.7529  |        70.5833         |
|           DebertaForMaskedLM            |  4  | 85.6156  |  108.71   | 70.1098  |        73.4708         |
|            PLBartForCausalLM            |  8  | 117.3346 | 119.6512  | 69.3641  |        68.3516         |
|          DistilBertForMaskedLM          | 128 | 84.8762  |  88.5464  | 69.1899  |        68.4262         |
|           RobertaForCausalLM            | 16  | 115.301  | 118.0582  | 68.1133  |        68.1011         |
|             BertForMaskedLM             | 16  | 110.2011 |  113.078  | 68.0186  |        68.2624         |
|             OPTForCausalLM              |  2  | 170.212  | 182.8355  | 67.5871  |        67.4549         |
|     MobileBertForQuestionAnswering      | 128 | 164.7543 | 197.9909  | 67.2228  |        143.4921        |
|               DistillGPT2               | 16  | 106.5138 | 109.9102  | 62.8081  |        61.5843         |
|                 T5Small                 |  4  | 104.9257 | 121.9239  | 62.0824  |         58.849         |
|       T5ForConditionalGeneration        |  4  | 104.9953 | 121.3086  | 62.0095  |         58.772         |
|          MobileBertForMaskedLM          | 64  | 168.8667 | 200.9407  | 58.9373  |        171.7954        |
|           PegasusForCausalLM            | 32  | 70.3766  |  73.9213  | 57.0846  |        56.7129         |
|         MegatronBertForCausalLM         |  4  | 86.0952  |  94.0243  | 54.0339  |        55.5416         |
|       ElectraForQuestionAnswering       | 64  | 116.7299 | 115.9068  | 53.2833  |        53.5584         |
|       RobertaForQuestionAnswering       | 16  | 95.8805  |  97.2241  | 53.0267  |        52.7336         |
|        BertForQuestionAnswering         | 16  |  95.396  |  96.7681  |  52.783  |        52.7857         |
|    LayoutLMForSequenceClassification    | 16  | 98.1073  |  99.1782  | 52.3855  |        53.4224         |
|             XGLMForCausalLM             |  8  | 97.9871  | 106.9202  | 51.9471  |        57.3932         |
|           ElectraForCausalLM            | 32  | 88.2358  |  92.7777  | 47.1427  |         47.045         |
|       BlenderbotSmallForCausalLM        | 64  | 58.7449  |  62.9233  | 46.1936  |        47.2401         |
|       MT5ForConditionalGeneration       | 16  | 90.9247  | 107.6747  | 39.7435  |        52.0501         |
|      GPT2ForSequenceClassification      |  4  | 92.5979  |  96.315   | 38.5239  |        38.8264         |
|         Speech2Text2ForCausalLM         | 256 | 53.7827  |  57.3725  | 35.1493  |        34.0149         |
|           LayoutLMForMaskedLM           | 16  | 112.634  | 115.6378  |   nan    |        69.2093         |
|          AllenaiLongformerBase          |  4  | 179.9937 | 269.8856  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9999 |  0.9985   |  3.0386  |         2.9841         |
|      xcit_large_24_p8_224       |  5  | 0.997  |  0.8723   |  1.9987  |         1.6109         |
|        twins_pcpvt_base         | 64  | 1.005  |  0.9253   |  1.9716  |         1.6847         |
|         coat_lite_mini          | 128 | 0.9994 |  0.9977   |  1.949   |         1.9283         |
|          gmlp_s16_224           | 128 | 0.9998 |  1.0885   |  1.8485  |         1.8509         |
|          ghostnet_100           | 128 | 0.9988 |  0.7536   |  1.7984  |         1.6318         |
|          gmixer_24_224          | 128 | 0.9998 |  0.8929   |  1.7628  |         1.7628         |
|           volo_d1_224           | 64  | 0.9996 |  0.9784   |  1.7175  |         1.6807         |
|         crossvit_9_240          | 128 | 0.9993 |  0.7885   |  1.6653  |         1.6362         |
|  swin_base_patch4_window7_224   | 64  | 0.9992 |  0.9643   |  1.6614  |         1.6298         |
|           convit_base           | 64  | 0.9997 |  0.9995   |  1.6286  |         1.6167         |
|       gluon_inception_v3        | 128 | 0.9998 |  0.8675   |  1.547   |          1.53          |
|          inception_v3           | 128 | 0.9998 |  0.8667   |  1.5466  |         1.5267         |
|             dla102              | 128 | 0.9992 |  0.8172   |  1.5457  |         1.531          |
|        adv_inception_v3         | 128 | 0.9997 |  0.8627   |  1.5346  |         1.5258         |
|          convnext_base          | 64  | 0.9995 |  1.0007   |  1.5274  |         1.5079         |
|            nfnet_l0             | 128 | 0.999  |  0.8194   |   1.52   |         1.4558         |
|           dm_nfnet_f0           | 128 | 0.9989 |  0.9977   |  1.5199  |         1.4557         |
|            lcnet_050            | 128 | 0.9447 |  0.7377   |  1.5116  |         1.4433         |
|        sebotnet33ts_256         | 64  | 0.9655 |  0.7698   |  1.4891  |         1.5546         |
|            pit_b_224            | 64  | 0.9992 |  0.9974   |  1.438   |         1.4384         |
|       eca_botnext26ts_256       | 128 | 0.9785 |  0.7213   |  1.4324  |         1.4326         |
|           resnest101e           | 64  | 0.9998 |  0.8706   |  1.4294  |         1.367          |
|           selecsls42b           | 128 | 0.9996 |  0.8123   |  1.4242  |         1.414          |
|           mobilevit_s           | 64  | 0.9713 |  0.7366   |  1.4168  |         1.4631         |
|          jx_nest_base           | 32  |  1.0   |  0.9969   |  1.4094  |         1.3823         |
|          cait_m36_384           |  4  | 1.0002 |  0.9985   |  1.3928  |         1.3594         |
|      mobilenetv3_large_100      | 128 | 0.9529 |   0.762   |  1.3884  |         1.4392         |
|          botnet26t_256          | 128 | 0.9773 |  0.8543   |  1.3881  |         1.4295         |
|           mnasnet_100           | 128 | 0.9507 |  0.7424   |  1.3833  |         1.5009         |
|           res2next50            | 128 | 0.9996 |  0.8258   |  1.3791  |         1.3652         |
|      beit_base_patch16_224      | 64  | 0.9994 |  0.9686   |  1.3711  |         1.3565         |
|          mixer_b16_224          | 128 | 0.9998 |  1.0205   |  1.3624  |         1.3656         |
|         poolformer_m36          | 64  | 0.9996 |  0.9965   |  1.3612  |         1.3423         |
|        ese_vovnet19b_dw         | 128 | 0.9657 |  0.8383   |  1.3398  |         1.3872         |
|        res2net50_14w_8s         | 128 | 0.9996 |  0.7907   |  1.3364  |         1.357          |
|         mobilenetv2_100         | 128 | 0.9503 |  0.7386   |  1.3323  |         1.4461         |
|       tf_efficientnet_b0        | 128 | 0.9644 |  0.6831   |  1.3221  |         1.3897         |
|           regnety_002           | 128 | 0.9671 |  0.7309   |  1.3118  |         1.276          |
|           fbnetc_100            | 128 | 0.9509 |  0.7398   |  1.3016  |         1.406          |
|          spnasnet_100           | 128 | 0.9445 |   0.74    |  1.2967  |         1.4134         |
| deit_base_distilled_patch16_224 | 64  | 0.9996 |   0.997   |  1.2761  |         1.2611         |
|            fbnetv3_b            | 128 | 0.9534 |  0.7712   |  1.2683  |         1.329          |
|           rexnet_100            | 128 | 0.9607 |  0.7084   |  1.2678  |         1.3506         |
|          resmlp_12_224          | 128 | 0.9999 |   0.895   |  1.2655  |         1.2695         |
|      vit_base_patch16_224       | 64  | 0.9996 |  0.9969   |  1.2526  |         1.2398         |
|          cspdarknet53           | 64  | 0.9427 |  0.7921   |  1.2027  |         1.2801         |
|            tinynet_a            | 128 | 0.9509 |  0.6802   |  1.1859  |         1.2346         |
|         visformer_small         | 128 | 0.9987 |  0.9479   |  1.1824  |         1.1702         |
|           tf_mixnet_l           | 128 | 0.9811 |  0.8302   |  1.1795  |         1.1985         |
|            mixnet_l             | 128 | 0.9806 |  0.8238   |  1.1674  |         1.1875         |
|            hrnet_w18            | 128 | 0.9978 |  0.6452   |  1.1361  |         1.3584         |
|        gluon_xception65         | 32  | 0.9998 |  0.8477   |  1.0874  |         1.0867         |
|     swsl_resnext101_32x16d      | 32  | 0.9995 |  0.8438   |  1.0786  |         1.0234         |
|            repvgg_a2            | 128 | 0.9434 |  0.7605   |  1.0599  |         1.1304         |
|             dpn107              | 32  | 0.9407 |  0.8135   |  1.0595  |         1.1498         |
|        convmixer_768_32         | 32  | 0.9994 |  0.9655   |  1.0019  |         1.0037         |
|            gernet_l             | 128 | 0.9449 |  0.7993   |  0.9926  |         1.0771         |
|          pnasnet5large          | 16  | 0.997  |   0.928   |  0.985   |         1.1446         |
|        res2net101_26w_4s        | 64  | 0.9993 |  0.7945   |  0.9765  |         1.0932         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|            hrnet_w18            | 128 | 9.6385  |  36.119   | 236.3629 |        253.9841        |
|           rexnet_100            | 128 | 5.6941  |  11.0942  | 224.7499 |        297.0445        |
|          ghostnet_100           | 128 | 7.5111  |  15.0644  | 192.2707 |        243.5687        |
|          pnasnet5large          | 16  | 8.2571  |  26.1448  | 157.8322 |        166.9827        |
|           resnest101e           | 64  | 11.0055 |  24.3721  | 153.3786 |        171.5376        |
|            fbnetv3_b            | 128 | 8.3106  |  17.0867  | 145.7091 |        174.1522        |
|        twins_pcpvt_base         | 64  | 10.5747 |  23.5811  | 142.3674 |        151.6112        |
|        res2net101_26w_4s        | 64  | 10.6954 |  24.7685  | 141.6476 |        154.5319        |
|           mobilevit_s           | 64  | 5.3018  |  11.3153  | 141.3446 |        159.2207        |
|           tf_mixnet_l           | 128 | 8.8671  |  16.8631  | 141.0283 |        165.4847        |
|            mixnet_l             | 128 | 8.2478  |  16.2618  | 137.9874 |        163.0108        |
|        adv_inception_v3         | 128 | 5.7683  |  12.5501  | 136.2104 |        160.4157        |
|       gluon_inception_v3        | 128 | 5.7081  |  12.6406  | 135.6235 |        160.3407        |
|      xcit_large_24_p8_224       |  5  | 12.7389 |  28.3979  | 134.7515 |        139.6318        |
|          inception_v3           | 128 |  5.682  |  12.4754  | 133.6956 |        165.7827        |
|            tinynet_a            | 128 | 5.9405  |  12.2701  | 133.3159 |        157.8647        |
|      mobilenetv3_large_100      | 128 | 4.2053  |  8.3601   | 132.1494 |        160.0576        |
|       tf_efficientnet_b0        | 128 | 5.1323  |  10.4306  | 131.2147 |        159.8844        |
|        res2net50_14w_8s         | 128 | 9.0527  |   22.33   | 120.6187 |        127.9824        |
|          spnasnet_100           | 128 | 4.9936  |  9.3692   | 116.4227 |        137.644         |
|          cait_m36_384           |  4  | 13.6071 |  30.1687  | 114.0578 |        118.0385        |
|           fbnetc_100            | 128 | 4.9477  |  9.4602   | 113.787  |        140.9721        |
|  swin_base_patch4_window7_224   | 64  | 8.1578  |  19.0725  | 111.2072 |        110.9941        |
|         mobilenetv2_100         | 128 | 4.1241  |  7.9261   | 106.1095 |        131.7379        |
|           mnasnet_100           | 128 | 4.0208  |  7.5855   | 104.9756 |        127.7246        |
|         poolformer_m36          | 64  | 7.5711  |  13.7433  | 98.6287  |        104.3555        |
|        sebotnet33ts_256         | 64  | 4.1651  |  8.7797   | 97.5362  |        111.204         |
|             dpn107              | 32  | 9.8717  |  20.6483  | 93.2761  |        101.1651        |
|           regnety_002           | 128 | 4.8691  |  8.8264   | 90.4638  |        112.3475        |
|        gluon_xception65         | 32  | 7.7764  |  16.8776  | 89.8794  |         94.893         |
|             dla102              | 128 | 6.3708  |  14.0414  | 88.8994  |        97.7882         |
|         coat_lite_mini          | 128 | 3.3184  |  8.0139   | 86.9779  |        89.7934         |
|          cspdarknet53           | 64  | 5.7375  |  10.909   | 86.2444  |        97.9996         |
|       eca_botnext26ts_256       | 128 | 3.1126  |  6.8554   | 85.2887  |         95.354         |
|          jx_nest_base           | 32  | 6.7205  |  14.7385  | 84.8486  |        86.9948         |
|         crossvit_9_240          | 128 | 5.8637  |  13.3288  | 84.7664  |        89.5937         |
|           res2next50            | 128 | 5.0368  |  12.0259  | 84.0762  |        90.5817         |
|            lcnet_050            | 128 | 2.5241  |  5.0083   | 81.2157  |        98.7998         |
|          botnet26t_256          | 128 | 2.9567  |  5.8386   | 80.4705  |        92.3701         |
|           selecsls42b           | 128 |  2.504  |   5.35    | 80.4452  |        91.0387         |
|           volo_d1_224           | 64  | 5.0695  |  11.7952  | 74.9403  |        76.6571         |
|        tnt_s_patch16_224        | 128 | 6.4027  |  16.0517  | 71.8875  |        72.2157         |
|            nfnet_l0             | 128 | 5.1188  |  10.9137  | 71.3636  |         77.962         |
|            gernet_l             | 128 | 4.9625  |   8.936   | 69.2153  |        81.9785         |
|        ese_vovnet19b_dw         | 128 |  2.558  |  4.6218   | 67.1834  |        77.2238         |
|           dm_nfnet_f0           | 128 | 6.1114  |  11.5027  | 67.1781  |        72.6525         |
|         visformer_small         | 128 | 2.6181  |  6.0496   |  64.77   |        70.1007         |
|     swsl_resnext101_32x16d      | 32  | 6.2061  |  13.6519  | 63.4013  |        63.4439         |
|          gmlp_s16_224           | 128 | 5.6266  |  11.9989  | 62.2919  |        59.7789         |
|          convnext_base          | 64  | 6.7117  |  12.747   | 59.5145  |        60.3991         |
|            repvgg_a2            | 128 |  4.857  |  8.6923   | 57.0335  |        62.9428         |
|          gmixer_24_224          | 128 | 5.6784  |  12.9544  | 53.2073  |        51.3636         |
|           convit_base           | 64  | 3.5308  |  8.7663   | 49.1626  |        49.6285         |
|            pit_b_224            | 64  |  3.408  |  8.1031   | 47.9603  |        47.5724         |
| deit_base_distilled_patch16_224 | 64  | 3.1382  |  7.1867   | 43.7509  |        43.3303         |
|      vit_base_patch16_224       | 64  | 3.0656  |  6.8828   | 42.8161  |        41.7886         |
|          resmlp_12_224          | 128 | 2.7634  |  5.2304   | 42.6594  |        43.3249         |
|        convmixer_768_32         | 32  | 1.6835  |   6.882   | 37.9084  |        37.4033         |
|      beit_base_patch16_224      | 64  | 3.8761  |  8.6857   | 37.1849  |         37.364         |
|          mixer_b16_224          | 128 | 2.6109  |   5.783   | 35.2125  |        35.2487         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.2872  |         1.2836         |
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.2057  |         1.2049         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  1.1899  |         1.1871         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1607  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.1583  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.1215  |         1.1179         |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  1.1129  |         1.1115         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  1.089   |         1.0876         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.0875  |         1.0845         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  1.0758  |         1.0721         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  1.0757  |         1.0728         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  1.0696  |         1.0675         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  1.0556  |         1.0539         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  1.0512  |         1.0506         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  1.0494  |         1.0457         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0377  |         1.0351         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  1.0361  |         1.0328         |
|          convnext_base          | 64  | 1.001  |   0.924   |  1.0346  |         1.0338         |
|             dla102              | 128 | 0.9635 |  0.9151   |  1.0323  |         1.0325         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  1.0251  |         1.0242         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  1.021   |         1.0202         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  1.0203  |         1.0194         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  1.0082  |         1.0072         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  1.0071  |         1.0057         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9976  |         0.9952         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9957  |         0.9948         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.9925  |          0.99          |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.9923  |         0.9902         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9917  |         0.9903         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.9912  |         0.9898         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9905  |         0.989          |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.9885  |         0.989          |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9864  |         0.9854         |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9821  |         0.9793         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.9793  |         0.9786         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.9793  |         0.977          |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.979   |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.9776  |         0.9732         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.9738  |         0.9706         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9732  |         0.9727         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.9714  |         0.9705         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.9702  |         0.9664         |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.966   |         0.9611         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.9646  |         0.9642         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.9637  |         0.9607         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.9611  |         0.9604         |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.9582  |         0.9535         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.9568  |         0.9547         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9562  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9537  |         0.9528         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.9509  |         0.9483         |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.9497  |         0.9451         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.9448  |         0.9403         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.9376  |         0.9361         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.9046  |         0.9045         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.9009  |         0.8966         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.8898  |         0.884          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.5084 | 310.6019  | 299.6472 |        298.9764        |
|            hrnet_w18            | 128 | 280.9756 | 431.9886  | 245.1197 |        205.9017        |
|          pnasnet5large          | 16  | 196.2302 | 210.9681  | 198.4071 |        171.261         |
|           tf_mixnet_l           | 128 | 192.9075 |  228.038  | 160.4759 |        157.9435        |
|            mixnet_l             | 128 | 184.4025 | 219.6761  | 154.9799 |        152.3898        |
|          cait_m36_384           |  4  | 166.9898 | 166.6094  | 119.6992 |        122.9185        |
|           resnest101e           | 64  | 164.4206 |  187.536  | 114.1721 |        119.6005        |
|             dla102              | 128 | 171.8072 | 210.1975  | 111.1216 |        112.2692        |
|     swsl_resnext101_32x16d      | 32  | 118.4488 | 140.0899  | 109.8399 |        115.8311        |
|         poolformer_m36          | 64  | 144.6602 | 145.3007  | 106.3384 |        107.7766        |
|        tnt_s_patch16_224        | 128 | 323.1803 |  323.386  | 106.2007 |        108.2569        |
|        res2net50_14w_8s         | 128 | 140.7279 | 177.7725  | 105.2124 |        103.7158        |
|        adv_inception_v3         | 128 | 160.2298 | 185.5973  | 104.2267 |        104.9872        |
|          inception_v3           | 128 | 160.2039 | 184.6833  | 103.5266 |        104.9087        |
|       gluon_inception_v3        | 128 | 160.1587 |  184.588  | 103.5153 |        104.6779        |
|        res2net101_26w_4s        | 64  | 100.8059 | 124.3219  | 100.3894 |        91.1544         |
|             dpn107              | 32  | 112.5981 |  130.363  | 99.9766  |        92.1068         |
|           convit_base           | 64  | 162.8674 | 162.8313  | 99.9584  |        100.6947        |
|           res2next50            | 128 | 126.0297 | 152.4268  | 91.1732  |        92.2402         |
|        gluon_xception65         | 32  |  99.109  | 116.3899  | 90.8689  |        91.0597         |
|  swin_base_patch4_window7_224   | 64  | 145.979  | 151.4237  |  87.878  |        89.4436         |
|            fbnetv3_b            | 128 | 114.6623 | 142.1953  | 86.1802  |        82.3131         |
|          mixer_b16_224          | 128 | 116.3486 | 114.0246  | 85.6602  |        85.2601         |
|           dm_nfnet_f0           | 128 | 126.7389 | 127.3359  |  83.199  |        87.1769         |
|            pit_b_224            | 64  | 118.2235 | 118.2921  | 82.1984  |        82.1123         |
|          convnext_base          | 64  | 122.2913 | 122.2082  | 79.9174  |        81.1658         |
|         visformer_small         | 128 | 91.0175  |  95.8438  | 76.8439  |        77.6257         |
|          gmlp_s16_224           | 128 | 136.7741 | 125.5279  | 74.0958  |        73.7838         |
|       eca_botnext26ts_256       | 128 | 108.2744 | 146.8328  | 74.0322  |         73.922         |
|      beit_base_patch16_224      | 64  | 101.1772 | 104.3448  | 73.7726  |         74.769         |
|          cspdarknet53           | 64  | 93.7705  | 111.6817  | 73.5626  |        69.1408         |
|            nfnet_l0             | 128 | 111.9619 | 136.0309  | 73.5505  |        76.9365         |
|            gernet_l             | 128 | 76.8398  |  90.9989  | 73.2081  |        67.5065         |
|          botnet26t_256          | 128 | 101.4133 | 116.0307  | 71.4824  |        69.3663         |
|          jx_nest_base           | 32  | 100.4444 | 100.6582  | 71.0349  |        72.5013         |
|           volo_d1_224           | 64  | 120.3801 | 122.7489  | 70.0183  |        71.6218         |
|      vit_base_patch16_224       | 64  |  86.597  |  86.7661  | 69.1285  |        69.8512         |
|            repvgg_a2            | 128 | 76.9864  |  95.4728  | 68.4374  |        64.2614         |
|          gmixer_24_224          | 128 | 117.5111 | 131.3559  | 66.7004  |        66.5603         |
| deit_base_distilled_patch16_224 | 64  | 84.5992  |  84.8196  | 66.2777  |        67.0761         |
|      xcit_large_24_p8_224       |  5  | 123.5369 | 146.7671  | 62.9266  |        77.4628         |
|       tf_efficientnet_b0        | 128 | 84.2803  | 119.4918  | 61.4252  |        58.6508         |
|           fbnetc_100            | 128 | 82.7908  | 106.3595  | 60.3872  |        55.9483         |
|           rexnet_100            | 128 | 79.3938  | 107.4653  | 59.9899  |         56.434         |
|        twins_pcpvt_base         | 64  | 117.4766 |  142.912  | 59.3831  |        70.2403         |
|            tinynet_a            | 128 |  73.264  | 102.3351  | 58.6559  |        56.4783         |
|         coat_lite_mini          | 128 | 112.5968 | 113.0606  | 57.8257  |         58.416         |
|           mobilevit_s           | 64  | 83.8381  | 110.3249  | 57.3224  |        55.5602         |
|        sebotnet33ts_256         | 64  | 79.8445  |  99.9478  | 51.7013  |        49.5774         |
|          spnasnet_100           | 128 | 70.1205  |  89.669   | 50.9799  |        46.9306         |
|          ghostnet_100           | 128 | 89.8777  | 119.4718  | 49.7896  |        55.1183         |
|         crossvit_9_240          | 128 | 81.9628  | 103.4958  | 49.0968  |        50.0126         |
|         mobilenetv2_100         | 128 | 65.5536  |   84.18   | 46.5884  |        42.9846         |
|        ese_vovnet19b_dw         | 128 | 64.0828  |  73.874   | 46.1596  |        44.5709         |
|           mnasnet_100           | 128 | 64.1407  |  81.9898  | 43.9151  |        40.5639         |
|           selecsls42b           | 128 | 59.9209  |  73.7861  | 42.0555  |        42.3563         |
|          resmlp_12_224          | 128 | 53.0904  |  59.3219  | 41.9486  |        41.8602         |
|      mobilenetv3_large_100      | 128 | 61.0747  |  76.3524  | 41.7824  |        40.4607         |
|           regnety_002           | 128 | 39.1622  |  53.0716  | 28.7128  |        29.2262         |
|            lcnet_050            | 128 |  31.577  |  40.4737  |  19.684  |        20.6709         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

Build Summary

see more

Run name

day_098_08_04_23_performance_amp_214

Commit hashes

pytorch commit: 54b1684
pytorch commit date: 2023-04-09 02:13:10+00:00
torchbench commit: 137c3f0e68280ab41c94403464058621a7c7fae1
torchbench commit date: 2023-04-08 04:29:31-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git54b1684

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 83%, 50/60 | 93%, 42/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.50x    |    1.62x    |    1.39x    |
| inductor_no_cudagraphs |   1.29x    |    1.54x    |    1.41x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.78    |    7.25     |    5.89     |
|       aot_eager        |    9.26    |    15.97    |    13.17    |
|        inductor        |   59.27    |    68.68    |    99.27    |
| inductor_no_cudagraphs |   64.21    |    60.44    |   111.82    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.95x    |    1.02x    |    1.02x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Previous report name: /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 83%, 50/60  | 83%, 50/60  |
|        inductor        | huggingface | 93%, 42/45  | 93%, 42/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.51x    |   1.50x   |
|        inductor        | huggingface |   1.62x    |   1.62x   |
|        inductor        | timm_models |   1.39x    |   1.39x   |
| inductor_no_cudagraphs | torchbench  |   1.29x    |   1.29x   |
| inductor_no_cudagraphs | huggingface |   1.53x    |   1.54x   |
| inductor_no_cudagraphs | timm_models |   1.40x    |   1.41x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+----------------------------+-----------------+------------------------+
|    suite    |            name            |    inductor     | inductor_no_cudagraphs |
+-------------+----------------------------+-----------------+------------------------+
| torchbench  |       hf_Longformer        |   fail_to_run   |      fail_to_run       |
| torchbench  |            moco            |   fail_to_run   |      fail_to_run       |
| torchbench  |     Background_Matting     | eager_variation |    eager_variation     |
| torchbench  |         tacotron2          |     0.0000      |         0.0000         |
| torchbench  |            gat             |     0.0000      |         0.0000         |
| torchbench  |            gcn             |     0.0000      |         0.0000         |
| torchbench  |           llama            |     0.0000      |         0.0000         |
| torchbench  |            sage            |     0.0000      |         0.0000         |
| torchbench  |       torchrec_dlrm        |     0.0000      |         0.0000         |
| huggingface | AlbertForQuestionAnswering |  fail_accuracy  |     fail_accuracy      |
+-------------+----------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |         lennard_jones         |  1.2545  |         0.8686         |
| torchbench  |             dcgan             |  1.2348  |         0.8145         |
| torchbench  |          tts_angular          |  0.9262  |         0.9626         |
| torchbench  |          timm_vovnet          |  0.9221  |         0.9549         |
| torchbench  |              drq              |  0.0074  |         1.0441         |
| torchbench  |       soft_actor_critic       |  0.0056  |         0.7946         |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             dlrm              |   0.0    |         1.1927         |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0844         |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  1.1794  |         0.9072         |
| huggingface |     DebertaV2ForMaskedLM      |  0.9546  |         0.7259         |
| huggingface | DebertaV2ForQuestionAnswering |  0.928   |         0.7561         |
| huggingface |      LayoutLMForMaskedLM      |   0.0    |         1.6266         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 177.5974 |        174.6414        |
| torchbench  |           hf_BigBird           | 164.8081 |        132.7544        |
| torchbench  |        phlippe_densenet        | 132.6475 |        159.0835        |
| torchbench  |          densenet121           | 123.9776 |        138.0088        |
| torchbench  |       timm_efficientnet        | 119.6911 |        146.0414        |
| torchbench  |       mobilenet_v3_large       | 110.5632 |        135.3489        |
| torchbench  |          mobilenet_v2          | 105.2505 |        129.1839        |
| torchbench  |             yolov3             | 102.8644 |        123.5811        |
| torchbench  | timm_vision_transformer_large  |   nan    |        128.9705        |
| huggingface | DebertaV2ForQuestionAnswering  | 200.5111 |        73.4598         |
| huggingface |      DebertaV2ForMaskedLM      | 197.0868 |        71.5388         |
| huggingface |     MobileBertForMaskedLM      | 148.197  |        146.5383        |
| huggingface | MobileBertForQuestionAnswering | 145.4271 |        152.9658        |
| huggingface | M2M100ForConditionalGeneration | 135.3686 |        136.1422        |
| huggingface |        XGLMForCausalLM         | 129.5955 |        134.1363        |
| huggingface |  MT5ForConditionalGeneration   | 126.856  |        132.8181        |
| timm_models |           hrnet_w18            | 235.7112 |        255.7989        |
| timm_models |           rexnet_100           | 231.6876 |        301.4045        |
| timm_models |          ghostnet_100          | 197.3989 |        246.9963        |
| timm_models |         pnasnet5large          | 169.3075 |        166.2563        |
| timm_models |          resnest101e           | 152.5912 |        170.1393        |
| timm_models |           fbnetv3_b            | 148.6613 |        176.5584        |
| timm_models |          mobilevit_s           | 143.6318 |        166.7125        |
| timm_models |       res2net101_26w_4s        | 141.2268 |        155.1759        |
| timm_models |          tf_mixnet_l           | 140.1769 |        163.8294        |
| timm_models |        twins_pcpvt_base        | 139.9087 |        149.4916        |
| timm_models |            mixnet_l            | 136.9195 |        161.6925        |
| timm_models |      xcit_large_24_p8_224      | 135.9468 |        136.3422        |
| timm_models |     mobilenetv3_large_100      | 134.4972 |        156.6946        |
| timm_models |           tinynet_a            | 133.8497 |        162.1221        |
| timm_models |        adv_inception_v3        | 132.2084 |        158.3606        |
| timm_models |          inception_v3          | 129.6631 |        164.1013        |
| timm_models |       gluon_inception_v3       | 128.4666 |        166.2538        |
| timm_models |       tf_efficientnet_b0       | 125.691  |        160.362         |
| timm_models |        res2net50_14w_8s        | 119.0318 |        124.7547        |
| timm_models |           fbnetc_100           | 115.5414 |        144.4452        |
| timm_models |          spnasnet_100          | 112.8555 |        141.9405        |
| timm_models |        mobilenetv2_100         | 106.2269 |        130.0432        |
| timm_models |          mnasnet_100           | 104.4557 |        120.0916        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |         nvidia_deeprecommender          |  0.9195  |         0.8931         |
| torchbench  |             pytorch_stargan             |  0.8935  |         0.8893         |
| torchbench  |                resnet50                 |  0.8909  |         0.8869         |
| torchbench  |               timm_vovnet               |  0.889   |         0.8869         |
| torchbench  |         timm_vision_transformer         |  0.8873  |         0.8835         |
| torchbench  |            phlippe_densenet             |  0.8834  |         0.8659         |
| torchbench  |           mobilenet_v3_large            |  0.8796  |         0.7757         |
| torchbench  |           speech_transformer            |  0.8694  |         0.869          |
| torchbench  |               densenet121               |  0.824   |         0.8017         |
| torchbench  |               hf_Reformer               |  0.8132  |         0.8022         |
| torchbench  |               mnasnet1_0                |  0.7862  |         0.8085         |
| torchbench  |             resnext50_32x4d             |  0.7773  |         0.772          |
| torchbench  |             LearningToPaint             |  0.7552  |         0.7463         |
| torchbench  |             pytorch_struct              |  0.7428  |         0.7362         |
| torchbench  |                resnet18                 |  0.619   |         0.6097         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.6035  |         0.6004         |
| torchbench  |          functorch_dp_cifar10           |  0.451   |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3554  |         0.3395         |
| huggingface |          DistilBertForMaskedLM          |  0.8872  |         0.9624         |
| huggingface |            TrOCRForCausalLM             |  0.8855  |         0.9583         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8749  |         0.9803         |
| huggingface |     MobileBertForQuestionAnswering      |  0.8399  |         0.8392         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8215  |         0.9119         |
| huggingface |         Speech2Text2ForCausalLM         |  0.7921  |         0.8779         |
| timm_models |               regnety_002               |  0.9009  |         0.8966         |
| timm_models |                lcnet_050                |  0.8898  |         0.884          |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/comp_time_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/passrate_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

No regressions found.

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|           BERT_pytorch            |  16  | 1.0023 |  0.8111   |  3.328   |         2.1319         |
|       functorch_dp_cifar10        |  64  | 0.9665 |  0.9146   |  3.2412  |         1.366          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9698 |  0.9222   |  2.8797  |         1.7986         |
|            hf_BigBird             |  2   | 0.9586 |  0.7787   |  2.5417  |         1.6647         |
|            hf_T5_large            |  2   | 1.0087 |  0.8303   |  2.5238  |         1.9808         |
|             hf_Albert             |  8   | 0.9957 |  0.9565   |  2.351   |         2.3021         |
|               hf_T5               |  8   | 0.9947 |   0.857   |  1.9606  |         2.0297         |
|           squeezenet1_1           |  32  | 0.986  |  0.9271   |  1.9425  |         1.2295         |
|              hf_GPT2              |  4   | 1.0222 |  0.9812   |  1.9247  |         1.9215         |
|              hf_Bert              |  4   | 1.0311 |  0.8661   |  1.9161  |         1.6846         |
|            densenet121            |  4   | 0.995  |  0.7179   |  1.9046  |         1.0695         |
|           hf_GPT2_large           |  4   | 0.9999 |  0.9888   |  1.8028  |         1.7924         |
|              hf_Bart              |  4   | 0.9966 |  0.8395   |  1.749   |         1.6007         |
|        mobilenet_v3_large         |  32  | 1.0014 |  0.7907   |  1.6477  |         1.1866         |
|           hf_Bert_large           |  4   | 1.0287 |  0.8872   |  1.6471  |         1.6271         |
|         phlippe_densenet          | 128  | 0.9973 |  0.7737   |  1.6267  |         1.0022         |
|           timm_resnest            |  32  | 0.9971 |  0.8544   |  1.5963  |         1.538          |
|            timm_nfnet             | 128  | 0.9995 |  0.9976   |  1.5914  |         1.5033         |
|      timm_vision_transformer      |  32  | 0.9909 |  0.8668   |  1.5866  |         1.3873         |
| attention_is_all_you_need_pytorch | 256  |  1.0   |  0.9252   |  1.5565  |         1.4988         |
|           fastNLP_Bert            |  6   | 1.0101 |  0.8664   |  1.5499  |         1.5458         |
|           mobilenet_v2            |  96  | 0.9991 |   0.779   |  1.5434  |         1.4997         |
|           hf_DistilBert           |  8   | 0.9932 |  0.9675   |  1.5173  |         1.4988         |
|          phlippe_resnet           | 128  | 0.9901 |  0.7603   |  1.5034  |         1.0132         |
|        speech_transformer         |  32  | 0.9845 |   0.794   |  1.489   |         1.6368         |
|        shufflenet_v2_x1_0         | 128  | 0.9966 |  0.7622   |  1.4406  |         1.2095         |
|          pytorch_struct           | 200  | 0.9079 |  0.7748   |  1.4196  |         1.1141         |
|           pytorch_unet            |  1   | 0.9991 |  0.2042   |  1.3722  |         1.3573         |
|             resnet18              |  16  | 0.9911 |  0.7466   |  1.3644  |         0.982          |
|          resnext50_32x4d          |  8   | 0.9864 |  0.7163   |  1.3486  |         1.0006         |
|            mnasnet1_0             |  32  | 0.9939 |  0.7382   |  1.3107  |         1.0811         |
|          pytorch_stargan          |  16  | 0.989  |  0.8054   |  1.2879  |         1.2738         |
|          LearningToPaint          |  96  | 0.9935 |  0.7779   |  1.2647  |         1.0672         |
|               vgg16               |  64  | 0.9994 |  0.9982   |  1.2628  |         1.2565         |
|           lennard_jones           | 1000 | 0.8598 |  0.7504   |  1.2545  |         0.8686         |
|               dcgan               |  32  | 0.8725 |  0.6931   |  1.2348  |         0.8145         |
|            Super_SloMo            |  6   | 0.9992 |  0.1778   |  1.2347  |         1.2338         |
|        Background_Matting         |  4   | 0.9996 |  0.1357   |  1.2211  |         1.2107         |
|              yolov3               |  16  | 0.9994 |  0.8082   |  1.2175  |         1.206          |
|         timm_efficientnet         |  32  | 0.9474 |  0.6296   |  1.1851  |          1.1           |
|             resnet50              |  32  | 0.9958 |  0.7661   |  1.1807  |         1.0646         |
|              alexnet              | 128  | 0.9988 |  0.9966   |  1.1401  |         1.1378         |
|            hf_Reformer            |  4   | 0.9861 |  0.9684   |  1.1297  |         1.0722         |
|              demucs               |  4   | 1.0014 |  1.0028   |  1.0623  |         1.0403         |
|             resnet152             |  32  | 0.9989 |  0.7536   |  1.0349  |         1.0258         |
|      nvidia_deeprecommender       | 256  | 0.9991 |  0.9991   |  0.9785  |         1.0192         |
|            timm_regnet            |  32  | 0.9325 |  0.7806   |  0.9681  |         1.0026         |
|            tts_angular            |  64  | 0.9216 |  0.8814   |  0.9262  |         0.9626         |
|            timm_vovnet            |  32  | 0.8763 |   0.726   |  0.9221  |         0.9549         |
|                drq                |  1   | 0.9333 |  0.7354   |  0.0074  |         1.0441         |
|         soft_actor_critic         | 256  | 0.8586 |  0.6199   |  0.0056  |         0.7946         |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9321 |  0.8514   |   0.0    |         1.1927         |
|               moco                |  32  | 0.9794 |    0.0    |   0.0    |          0.0           |
|           hf_Longformer           |  2   | 1.0179 |  0.6914   |   0.0    |          0.0           |
|   timm_vision_transformer_large   |  32  | 0.9999 |    0.0    |   0.0    |         1.0844         |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|          vision_maskrcnn          |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 27.0268 |  55.5024  | 177.5974 |        174.6414        |
|            hf_BigBird             |  2   | 12.8958 |  36.9317  | 164.8081 |        132.7544        |
|         phlippe_densenet          | 128  | 3.2404  |   7.04    | 132.6475 |        159.0835        |
|            densenet121            |  4   | 7.7114  |  18.3603  | 123.9776 |        138.0088        |
|         timm_efficientnet         |  32  | 4.9989  |  10.0804  | 119.6911 |        146.0414        |
|        mobilenet_v3_large         |  32  | 3.4694  |  7.6782   | 110.5632 |        135.3489        |
|           hf_GPT2_large           |  4   | 14.7904 |  30.2397  | 106.6556 |        108.0094        |
|           mobilenet_v2            |  96  | 3.1706  |  7.0716   | 105.2505 |        129.1839        |
|              yolov3               |  16  | 4.8561  |  10.5846  | 102.8644 |        123.5811        |
|             resnet152             |  32  | 9.1354  |  20.2185  | 100.1738 |        108.2349        |
|            mnasnet1_0             |  32  | 3.1001  |  6.8021   | 85.6949  |        109.1465        |
|           timm_resnest            |  32  | 1.8127  |  3.8886   | 83.9061  |        102.8926        |
|        speech_transformer         |  32  | 6.0422  |  13.3869  | 79.1479  |        77.6458         |
| attention_is_all_you_need_pytorch | 256  | 4.4243  |  10.7883  | 74.1333  |        73.8128         |
|        shufflenet_v2_x1_0         | 128  | 3.4857  |  7.6546   |  70.522  |        83.4245         |
|            timm_regnet            |  32  | 6.6842  |  12.2139  |  69.787  |        74.9668         |
|           BERT_pytorch            |  16  | 4.9701  |  11.6605  | 69.1268  |        70.2596         |
|            timm_nfnet             | 128  |  5.735  |  11.2256  | 66.8149  |        73.8357         |
|           hf_Bert_large           |  4   | 10.3405 |  21.2561  | 66.0489  |        66.7393         |
|        Background_Matting         |  4   | 3.1304  |  11.1722  | 63.3441  |        71.4468         |
|             resnet50              |  32  | 3.2462  |  6.9553   | 58.0564  |        66.4771         |
|            timm_vovnet            |  32  | 3.5999  |   6.343   | 56.2209  |        63.9292         |
|           pytorch_unet            |  1   | 1.5396  |  4.3829   | 52.3654  |        61.9247         |
|               hf_T5               |  8   | 5.5144  |  12.5582  | 52.2717  |        49.6557         |
|           fastNLP_Bert            |  6   | 5.1129  |  11.279   | 52.1198  |         51.153         |
|              hf_Bart              |  4   | 6.0815  |  13.6901  |  50.745  |        51.1686         |
|      timm_vision_transformer      |  32  | 3.3025  |  7.1909   | 49.8976  |        52.7709         |
|          resnext50_32x4d          |  8   |  3.227  |  6.9724   | 49.0981  |        54.7811         |
|       functorch_dp_cifar10        |  64  | 1.2252  |  2.4148   | 45.8914  |        57.1723         |
|            hf_Reformer            |  4   | 4.1258  |  6.0039   | 45.2972  |         39.801         |
|            Super_SloMo            |  6   | 2.7518  |  9.8869   | 44.6637  |        43.4709         |
|              hf_GPT2              |  4   | 4.6514  |  9.7827   | 42.5301  |        42.1871         |
|          pytorch_stargan          |  16  | 1.2278  |  3.1939   |  40.256  |        48.3093         |
|             hf_Albert             |  8   | 2.5079  |  8.0005   | 40.2089  |        40.5126         |
|          LearningToPaint          |  96  | 1.4099  |  2.8768   |  40.132  |        44.2606         |
|              hf_Bert              |  4   | 5.0232  |  10.4376  | 39.7518  |        41.1105         |
|             resnet18              |  16  | 1.3543  |  2.9111   | 38.4886  |        44.5848         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2229  |  3.1044   | 33.1091  |        36.4206         |
|           hf_DistilBert           |  8   | 2.3886  |   5.233   | 31.9098  |         31.389         |
|          phlippe_resnet           | 128  | 1.3597  |  2.8253   | 29.9325  |        34.8487         |
|              demucs               |  4   | 1.4426  |   2.199   | 29.3385  |         28.564         |
|           squeezenet1_1           |  32  | 1.0346  |  1.8475   | 23.8104  |        25.7065         |
|          pytorch_struct           | 200  | 0.7471  |   1.326   |  22.033  |        22.0501         |
|               vgg16               |  64  | 0.6294  |  1.1123   | 15.6949  |        16.5308         |
|              alexnet              | 128  | 0.4989  |   0.778   | 15.6821  |        14.2378         |
|                drq                |  1   | 0.6641  |  1.0101   | 11.3741  |         9.8532         |
|      nvidia_deeprecommender       | 256  |  0.485  |  0.7604   | 11.1052  |        10.9343         |
|               dcgan               |  32  | 0.4443  |   0.706   |  9.2843  |         8.2871         |
|         soft_actor_critic         | 256  |  0.427  |  0.5986   |  9.1706  |         8.1791         |
|            tts_angular            |  64  | 0.4492  |  0.5143   |  7.1166  |         6.8446         |
|           lennard_jones           | 1000 | 0.4038  |  0.5973   |  6.3299  |         7.5403         |
|   timm_vision_transformer_large   |  32  | 9.4684  |    nan    |   nan    |        128.9705        |
|               dlrm                | 1024 | 0.3909  |  0.7989   |   nan    |         7.9094         |
|           hf_Longformer           |  2   | 9.3528  |  30.5735  |   nan    |          nan           |
|               moco                |  32  | 33.056  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.265   |         1.2557         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  1.193   |         1.1717         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.1751  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.1727  |         1.1719         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  1.1687  |         1.168          |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  1.1425  |         1.1191         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  1.1361  |         1.1266         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  1.1334  |         1.128          |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  1.1141  |         1.0713         |
|           mobilenet_v2            |  96  | 0.9869 |  0.7651   |  1.1078  |         1.1025         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  1.1053  |         0.9973         |
|            timm_nfnet             | 128  | 0.9071 |  0.8749   |  1.0754  |         1.0734         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  1.0737  |         1.0725         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  1.0687  |         0.9997         |
|                drq                |  1   | 0.9877 |  0.8852   |  1.0607  |         0.9573         |
|        Background_Matting         |  4   | 1.0125 |  0.6489   |  1.0427  |         1.0406         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  1.0344  |         1.026          |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  1.0292  |         0.9945         |
|              yolov3               |  16  | 0.9832 |  0.8253   |  1.0161  |         1.016          |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9952  |         0.9983         |
|              demucs               |  4   | 0.9661 |  0.9659   |  0.9866  |         0.9656         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.9823  |         0.9808         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.978   |         0.9173         |
|        shufflenet_v2_x1_0         | 128  | 0.955  |  0.8396   |  0.9736  |         0.9666         |
|           timm_resnest            |  32  | 0.9887 |  0.8969   |  0.9713  |         0.9665         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.9644  |         0.9645         |
|            timm_regnet            |  32  | 0.9953 |  0.8503   |  0.9552  |         0.9523         |
|         timm_efficientnet         |  32  | 0.9877 |  0.7664   |  0.9474  |         0.9404         |
|             resnet152             |  32  | 0.9951 |  0.8948   |  0.9444  |         0.942          |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.9434  |         0.939          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.9306  |         0.9308         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.9195  |         0.8931         |
|           squeezenet1_1           |  32  | 0.9666 |  0.9312   |  0.909   |         0.9087         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.8935  |         0.8893         |
|             resnet50              |  32  | 0.9907 |  0.8603   |  0.8909  |         0.8869         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.889   |         0.8869         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8873  |         0.8835         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8834  |         0.8659         |
|        mobilenet_v3_large         |  32  | 0.9783 |  0.8392   |  0.8796  |         0.7757         |
|        speech_transformer         |  32  | 0.9915 |    0.9    |  0.8694  |         0.869          |
|            densenet121            |  4   | 0.994  |  0.9823   |  0.824   |         0.8017         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.8132  |         0.8022         |
|            mnasnet1_0             |  32  | 0.9801 |   0.897   |  0.7862  |         0.8085         |
|          resnext50_32x4d          |  8   | 0.9922 |  0.8413   |  0.7773  |         0.772          |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.7552  |         0.7463         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7428  |         0.7362         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.619   |         0.6097         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8796   |  0.6035  |         0.6004         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.451   |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3554  |         0.3395         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |         1.0009         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 1.0057 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|                drq                |  1   |  3.7295  |  4.4661   | 754.5937 |         3.4603         |
|         soft_actor_critic         | 256  |  1.759   |  2.7807   | 570.873  |         2.0022         |
|           hf_GPT2_large           |  4   | 209.2932 | 211.7072  | 115.9947 |        116.887         |
|        Background_Matting         |  4   | 126.0241 | 927.1566  | 103.1491 |        103.9893        |
|               hf_T5               |  8   | 180.297  | 209.1244  | 91.7555  |        88.6523         |
|            hf_T5_large            |  2   | 221.8159 | 268.9271  | 89.1595  |        113.6654        |
|            hf_BigBird             |  2   | 204.3783 | 248.4027  | 77.3469  |        114.3067        |
|            timm_nfnet             | 128  | 118.7066 | 118.5197  | 74.7735  |        78.9282         |
|            hf_Reformer            |  4   |  82.202  |  83.7474  | 71.7213  |        75.6973         |
|            Super_SloMo            |  6   | 79.4759  | 446.5178  | 64.4046  |        64.3634         |
|             resnet152             |  32  | 65.0551  |  87.2987  | 61.7734  |        62.9161         |
|            timm_regnet            |  32  | 59.9154  |  72.0014  | 57.2852  |        56.6375         |
|              yolov3               |  16  | 68.6586  |  84.8152  | 56.2388  |        57.0259         |
|               vgg16               |  64  | 66.3969  |  66.4582  | 52.5833  |        52.8719         |
|              demucs               |  4   | 53.9073  |  53.4377  | 50.8452  |        51.6394         |
|           hf_Bert_large           |  4   | 80.3488  |  92.5779  | 49.8639  |        50.5507         |
|        speech_transformer         |  32  | 65.0664  |  81.6432  | 40.1757  |         38.443         |
| attention_is_all_you_need_pytorch | 256  | 55.3254  |  58.577   |  34.746  |        35.7445         |
|           fastNLP_Bert            |  6   | 52.3758  |  60.3867  | 34.6013  |        34.6647         |
|              hf_Bart              |  4   | 63.6117  |  68.558   | 33.2562  |         35.342         |
|           mobilenet_v2            |  96  | 47.1786  |  60.434   | 30.4891  |        31.4041         |
|             hf_Albert             |  8   | 68.7355  |  71.5118  | 29.4974  |        29.6545         |
|           pytorch_unet            |  1   | 39.9336  | 195.2982  | 29.0831  |        29.4446         |
|            densenet121            |  4   | 55.2042  |  74.3841  | 28.9123  |        49.8791         |
|         timm_efficientnet         |  32  | 34.0058  |  50.4857  | 27.2326  |         29.321         |
|            timm_vovnet            |  32  | 28.2232  |  34.3356  | 26.7302  |        26.0322         |
|              hf_GPT2              |  4   | 48.3394  |  50.078   | 25.3211  |        27.5927         |
|             resnet50              |  32  | 26.7422  |  34.5009  | 22.3187  |        25.1084         |
|              hf_Bert              |  4   | 39.1926  |  45.6724  | 21.5114  |         24.545         |
|        shufflenet_v2_x1_0         | 128  | 31.0005  |  39.3778  | 21.2005  |        25.5999         |
|           hf_DistilBert           |  8   | 31.7335  |  32.4824  | 21.0229  |        20.9131         |
|            mnasnet1_0             |  32  | 22.3084  |  30.2188  | 17.8919  |        20.6553         |
|      timm_vision_transformer      |  32  | 32.2152  |  32.036   |  17.645  |        20.1655         |
|        mobilenet_v3_large         |  32  |  27.09   |  33.9679  | 16.3394  |        22.8934         |
|           BERT_pytorch            |  16  |  54.899  |  67.5585  | 16.1645  |        24.3938         |
|           timm_resnest            |  32  | 24.1654  |  28.235   | 15.0416  |        15.7093         |
|          resnext50_32x4d          |  8   | 20.7931  |  30.121   | 14.9558  |        19.6451         |
|         phlippe_densenet          | 128  | 23.3275  |  30.121   | 14.4263  |        24.3172         |
|          pytorch_stargan          |  16  | 15.3967  |  18.0499  | 11.7609  |         12.005         |
|      nvidia_deeprecommender       | 256  | 10.2201  |  10.2281  | 10.4406  |        10.0257         |
|          LearningToPaint          |  96  |  11.38   |  14.5679  |  9.0243  |        10.3973         |
|              alexnet              | 128  |  9.8618  |  9.8711   |  8.6138  |         8.6556         |
|             resnet18              |  16  |  9.3503  |  13.3359  |  6.7366  |         9.6058         |
|            tts_angular            |  64  |  6.7639  |  7.0803   |  6.7249  |         6.3427         |
|          phlippe_resnet           | 128  |  9.1077  |  11.7344  |  6.0114  |         9.0039         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  14.404  |  17.9262  |  5.9577  |         7.9615         |
|           squeezenet1_1           |  32  | 10.5523  |  12.4585  |  5.4777  |         8.7391         |
|          pytorch_struct           | 200  |  5.0934  |  6.0127   |  3.2796  |         4.2347         |
|       functorch_dp_cifar10        |  64  | 10.5252  |  11.1012  |  3.109   |         7.4414         |
|               dcgan               |  32  |  2.4319  |  3.0153   |  1.7117  |         2.6452         |
|           lennard_jones           | 1000 |  1.8329  |  2.1209   |  1.2337  |         1.7759         |
|   timm_vision_transformer_large   |  32  | 463.9218 |    nan    |   nan    |        427.9375        |
|               dlrm                | 1024 |  4.4592  |  4.7849   |   nan    |         3.9251         |
|           hf_Longformer           |  2   | 110.8316 | 164.0137  |   nan    |          nan           |
|               moco                |  32  | 52.4318  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 1.0149 |  0.8601   |  3.0205  |         1.181          |
|             OPTForCausalLM              |  2  | 0.9891 |   0.927   |  2.5088  |          2.53          |
|     MobileBertForQuestionAnswering      | 128 | 1.0164 |  0.8829   |  2.4906  |         1.1707         |
|      GPT2ForSequenceClassification      |  4  | 0.9891 |  0.9638   |  2.376   |         2.3612         |
|       MT5ForConditionalGeneration       | 16  | 1.0163 |   0.854   |  2.3154  |         1.9434         |
|       ElectraForQuestionAnswering       | 64  | 0.9981 |  0.9875   |  2.153   |         2.1412         |
|             XGLMForCausalLM             |  8  | 0.9931 |  0.8553   |  2.0503  |         1.5931         |
|           ElectraForCausalLM            | 32  | 0.9961 |  0.9505   |  1.8808  |         1.8708         |
|    LayoutLMForSequenceClassification    | 16  | 0.997  |  0.9838   |  1.8668  |         1.8339         |
|            XLNetLMHeadModel             |  8  | 0.9977 |  0.9706   |  1.8457  |         1.8473         |
|       RobertaForQuestionAnswering       | 16  | 0.9975 |  0.9832   |  1.8042  |         1.8063         |
|        BertForQuestionAnswering         | 16  | 0.9977 |  0.9829   |  1.804   |         1.8052         |
|         MegatronBertForCausalLM         |  4  | 1.0178 |  0.9466   |  1.7498  |         1.5773         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9978 |  0.9783   |  1.7034  |         1.6786         |
|           RobertaForCausalLM            | 16  | 0.9978 |  0.9728   |  1.6901  |         1.6974         |
|               DistillGPT2               | 16  | 0.9924 |  0.9595   |  1.6849  |         1.7178         |
|                 T5Small                 |  4  | 0.992  |  0.8711   |  1.6811  |         1.7563         |
|       T5ForConditionalGeneration        |  4  | 0.995  |  0.8619   |  1.6796  |         1.7547         |
|     PLBartForConditionalGeneration      |  4  | 0.9872 |  0.9424   |  1.6656  |         1.6715         |
|            PLBartForCausalLM            |  8  | 0.9891 |   0.955   |  1.6643  |         1.7037         |
|       AlbertForQuestionAnswering        |  4  | 1.0002 |  0.8859   |  1.6518  |          1.65          |
|            AlbertForMaskedLM            |  4  | 1.0003 |  0.8851   |  1.6457  |         1.6424         |
|             BertForMaskedLM             | 16  | 0.9973 |  0.9727   |  1.6191  |         1.6145         |
|     M2M100ForConditionalGeneration      | 16  | 0.9949 |   0.847   |  1.5724  |         1.4278         |
|                CamemBert                | 16  | 0.9979 |  0.9737   |  1.5642  |         1.5629         |
|             BartForCausalLM             |  4  | 0.9895 |  0.9496   |  1.5563  |         1.5642         |
|            MBartForCausalLM             |  4  | 0.9835 |   0.957   |  1.5512  |         1.5553         |
|         Speech2Text2ForCausalLM         | 256 | 0.9857 |  0.9178   |  1.5467  |         1.5681         |
|            YituTechConvBert             | 16  | 0.997  |  0.9699   |  1.5291  |         1.5214         |
|      BartForConditionalGeneration       |  2  | 1.0117 |  0.9687   |  1.5212  |         1.5497         |
|      MBartForConditionalGeneration      |  2  | 1.0135 |  0.9724   |  1.5139  |         1.4868         |
|     DistilBertForQuestionAnswering      | 256 | 0.9964 |   0.991   |  1.4674  |         1.461          |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0074 |  0.8981   |  1.4262  |         1.4467         |
|     PegasusForConditionalGeneration     | 32  | 1.0084 |  0.9583   |  1.3361  |         1.3526         |
|          BlenderbotForCausalLM          |  4  | 0.9988 |  0.8479   |  1.3145  |         1.3003         |
|            TrOCRForCausalLM             | 32  | 0.9877 |  0.9577   |  1.2791  |         1.2959         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9878 |  0.9149   |  1.2506  |         1.2759         |
|           PegasusForCausalLM            | 32  | 0.9906 |  0.9148   |  1.2399  |         1.3042         |
|          DistilBertForMaskedLM          | 128 | 0.9958 |  0.9555   |  1.2248  |         1.2461         |
|       DebertaForQuestionAnswering       |  8  | 0.8322 |  0.7409   |  1.1802  |         1.0611         |
|           DebertaForMaskedLM            |  4  | 0.751  |  0.5857   |  1.1794  |         0.9072         |
|          DebertaV2ForMaskedLM           |  1  | 0.7417 |  0.5495   |  0.9546  |         0.7259         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7315 |  0.5544   |  0.928   |         0.7561         |
|           LayoutLMForMaskedLM           | 16  | 0.9978 |   0.973   |   0.0    |         1.6266         |
|          AllenaiLongformerBase          |  4  | 1.0074 |  0.6654   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|      DebertaV2ForQuestionAnswering      |  2  | 15.2328 |  26.9332  | 200.5111 |        73.4598         |
|          DebertaV2ForMaskedLM           |  1  | 15.456  |  27.0173  | 197.0868 |        71.5388         |
|          MobileBertForMaskedLM          | 64  | 16.6173 |  40.4768  | 148.197  |        146.5383        |
|     MobileBertForQuestionAnswering      | 128 | 16.6222 |  42.0017  | 145.4271 |        152.9658        |
|     M2M100ForConditionalGeneration      | 16  | 12.0636 |  26.4923  | 135.3686 |        136.1422        |
|             XGLMForCausalLM             |  8  | 9.5662  |  21.9348  | 129.5955 |        134.1363        |
|       MT5ForConditionalGeneration       | 16  | 7.9354  |  19.2573  | 126.856  |        132.8181        |
|       DebertaForQuestionAnswering       |  8  | 7.4218  |  13.2862  | 102.5591 |         60.593         |
|           DebertaForMaskedLM            |  4  | 7.3066  |  13.5854  | 101.324  |        53.7615         |
|            XLNetLMHeadModel             |  8  | 10.5415 |  27.358   | 95.9337  |        95.7044         |
|      MBartForConditionalGeneration      |  2  | 11.7965 |  25.8296  | 80.3097  |        80.0781         |
|      BartForConditionalGeneration       |  2  | 11.8999 |  25.6788  | 76.6055  |        74.7709         |
|     PegasusForConditionalGeneration     | 32  | 5.3773  |  19.3588  | 70.3092  |        68.5677         |
|    MegatronBertForQuestionAnswering     |  8  | 10.1217 |  21.3294  |  70.104  |        67.0222         |
|          BlenderbotForCausalLM          |  4  | 11.0113 |  21.961   | 69.6987  |        69.1879         |
|            YituTechConvBert             | 16  | 7.0665  |  16.6344  | 67.6769  |        68.2969         |
|         MegatronBertForCausalLM         |  4  | 10.2898 |  21.7328  | 66.5009  |         67.219         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7361  |  18.0995  | 56.3927  |        55.6623         |
|           ElectraForCausalLM            | 32  | 5.5417  |  11.4746  | 51.9979  |        54.5221         |
|                 T5Small                 |  4  | 5.8247  |  13.3672  |  50.765  |        51.6251         |
|       T5ForConditionalGeneration        |  4  | 5.8514  |  12.5361  | 50.7646  |        51.3513         |
|     PLBartForConditionalGeneration      |  4  | 6.1971  |  14.3139  | 49.7839  |        49.8944         |
|    LayoutLMForSequenceClassification    | 16  |  5.783  |  11.808   | 46.7276  |        47.7222         |
|       ElectraForQuestionAnswering       | 64  | 5.5227  |  11.5041  | 45.0444  |        47.6472         |
|            MBartForCausalLM             |  4  | 5.7018  |  11.4566  | 40.3202  |        41.2869         |
|             BertForMaskedLM             | 16  | 5.1963  |  10.7273  | 40.2318  |        39.0234         |
|             BartForCausalLM             |  4  | 5.7977  |  11.0451  | 39.3062  |        37.7247         |
|        BertForQuestionAnswering         | 16  | 5.1814  |  10.8087  | 39.2725  |        40.3562         |
|           PegasusForCausalLM            | 32  | 6.0198  |  11.0641  | 39.0111  |        38.6251         |
|                CamemBert                | 16  | 5.4925  |  11.3628  | 38.7138  |        40.0282         |
|             OPTForCausalLM              |  2  | 4.7173  |  10.2713  | 38.5246  |        38.1173         |
|            TrOCRForCausalLM             | 32  | 5.7248  |  11.2357  | 38.4675  |        37.7691         |
|           RobertaForCausalLM            | 16  | 5.3359  |  11.022   | 38.3811  |        38.1738         |
|            AlbertForMaskedLM            |  4  | 2.3889  |  8.7007   | 38.2892  |        37.7894         |
|       RobertaForQuestionAnswering       | 16  | 5.2357  |  10.836   |  37.267  |        37.1307         |
|      GPT2ForSequenceClassification      |  4  | 4.6721  |  9.6189   | 36.7126  |        34.8531         |
|     DistilBertForQuestionAnswering      | 256 |  2.644  |  5.6187   | 36.1408  |        38.1649         |
|          DistilBertForMaskedLM          | 128 | 2.6499  |  5.3107   | 35.3706  |         36.783         |
|       AlbertForQuestionAnswering        |  4  | 2.3544  |  8.1243   | 34.6004  |        34.4201         |
|       BlenderbotSmallForCausalLM        | 64  | 4.0104  |   7.537   | 30.2533  |        29.7006         |
|               DistillGPT2               | 16  | 2.5294  |  5.0249   | 29.0925  |        27.6454         |
|            PLBartForCausalLM            |  8  |  3.007  |  5.9299   | 27.5149  |        26.3731         |
|         Speech2Text2ForCausalLM         | 256 | 3.1341  |  6.1327   | 26.0432  |        25.8613         |
|           LayoutLMForMaskedLM           | 16  | 5.8708  |  11.8274  |   nan    |        42.1962         |
|          AllenaiLongformerBase          |  4  | 9.7249  |  30.9592  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  1.3156  |         1.3147         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  1.2697  |         1.268          |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0537   |  1.2169  |         1.1525         |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1962  |         1.195          |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1958  |         1.2307         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.1782  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.1778  |         1.1724         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.1509  |         1.1479         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.1426  |         1.1368         |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.1261  |         1.1813         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.1261  |         1.1813         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  1.1159  |         1.1152         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.0965  |         1.1346         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  1.0827  |         1.0962         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0562  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.056   |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0532  |         1.0491         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  1.043   |         1.0411         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9154   |  1.0326  |         0.9988         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  1.0074  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  1.004   |         1.0307         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9764   |  1.0023  |         0.9799         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |  1.0003  |         0.999          |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.9899  |         0.9664         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.988   |         1.0139         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9772  |         1.052          |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9753  |         0.9739         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.971   |         1.0642         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.9505  |         1.016          |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9444  |         0.9912         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.9321  |         0.9908         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9294  |         0.9749         |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.9264  |         0.9792         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9162  |         0.9886         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.9161  |         0.9864         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9127  |         1.0018         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8872  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8855  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8749  |         0.9803         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.8399  |         0.8392         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8215  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.7921  |         0.8779         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |   nan    |         1.0518         |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8694   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.0782 | 300.8723  | 161.7351 |        162.1478        |
|       AlbertForQuestionAnswering        |  4  | 264.0647 | 297.9412  | 160.0362 |        160.1409        |
|            XLNetLMHeadModel             |  8  | 282.2979 | 291.9116  | 153.2819 |        153.2088        |
|      DebertaV2ForQuestionAnswering      |  2  | 146.5433 | 193.0075  | 114.5103 |        154.6574        |
|            TrOCRForCausalLM             | 32  | 139.7479 | 143.5714  | 107.8305 |        106.1696        |
|          DebertaV2ForMaskedLM           |  1  | 156.7695 | 186.8581  | 106.8919 |        141.7828        |
|     PegasusForConditionalGeneration     | 32  | 157.4399 | 166.2203  | 106.6753 |        108.1404        |
|      MBartForConditionalGeneration      |  2  | 147.8144 | 142.9872  | 90.9465  |         92.422         |
|      BartForConditionalGeneration       |  2  | 147.3662 | 142.9512  |  90.254  |        97.2555         |
|    MegatronBertForQuestionAnswering     |  8  | 142.3203 | 144.8027  | 83.2982  |        84.7337         |
|            YituTechConvBert             | 16  | 126.0447 | 129.6644  | 82.2934  |        82.5193         |
|          BlenderbotForCausalLM          |  4  | 105.1727 |  137.811  | 81.3603  |        90.6786         |
| BlenderbotSmallForConditionalGeneration | 64  | 123.943  | 137.9504  | 79.1183  |        78.3681         |
|                CamemBert                | 16  | 118.9696 |  121.81   | 75.8185  |         76.012         |
|            MBartForCausalLM             |  4  | 116.3053 | 119.1267  | 73.9495  |        72.9854         |
|             BartForCausalLM             |  4  | 114.9226 | 121.3875  | 72.9586  |        72.8539         |
|     M2M100ForConditionalGeneration      | 16  | 150.1594 | 145.0018  |  71.465  |        79.5686         |
|     PLBartForConditionalGeneration      |  4  | 121.527  | 124.7287  | 70.7396  |        70.4498         |
|     DistilBertForQuestionAnswering      | 256 | 104.363  | 104.3906  | 70.6712  |        71.4067         |
|            PLBartForCausalLM            |  8  | 117.5281 | 116.5624  |  69.898  |        68.7578         |
|          DistilBertForMaskedLM          | 128 | 85.1347  |  88.7229  | 69.1865  |        68.5107         |
|           RobertaForCausalLM            | 16  | 115.3317 | 118.2753  | 68.1082  |        67.8409         |
|             BertForMaskedLM             | 16  | 110.3404 | 112.9851  | 67.9604  |        68.2037         |
|     MobileBertForQuestionAnswering      | 128 | 163.8783 | 226.1059  | 67.6892  |        147.9499        |
|             OPTForCausalLM              |  2  | 173.0789 | 181.5499  | 67.6819  |         68.052         |
|       DebertaForQuestionAnswering       |  8  | 91.1429  | 102.0729  | 64.3301  |        71.3538         |
|               DistillGPT2               | 16  | 106.5667 | 110.1716  | 62.7983  |        61.5609         |
|                 T5Small                 |  4  | 107.7411 | 122.8622  |  62.373  |         59.462         |
|       T5ForConditionalGeneration        |  4  | 107.3653 |  121.285  | 62.3496  |        59.5812         |
|           DebertaForMaskedLM            |  4  | 81.9803  | 102.6773  | 59.2747  |        66.7424         |
|          MobileBertForMaskedLM          | 64  | 166.7326 | 206.5881  | 58.9678  |        150.3903        |
|           PegasusForCausalLM            | 32  | 71.9861  |  81.3632  |  57.464  |        57.3865         |
|         MegatronBertForCausalLM         |  4  | 85.9419  |  91.4339  | 54.1223  |        55.6773         |
|       ElectraForQuestionAnswering       | 64  | 116.7151 | 116.0968  | 53.2863  |        53.7401         |
|       RobertaForQuestionAnswering       | 16  | 96.0027  |  97.2536  | 53.0573  |        52.9738         |
|        BertForQuestionAnswering         | 16  | 95.6038  |  96.792   | 52.8752  |        52.7786         |
|    LayoutLMForSequenceClassification    | 16  | 98.3141  |  99.6142  | 52.3433  |        53.3117         |
|             XGLMForCausalLM             |  8  | 118.2588 | 126.0043  | 51.9066  |        63.9935         |
|           ElectraForCausalLM            | 32  | 88.8116  |  92.8769  |  46.868  |        47.1715         |
|       BlenderbotSmallForCausalLM        | 64  | 62.7772  |  63.9623  |  46.256  |         45.48          |
|       MT5ForConditionalGeneration       | 16  | 90.9246  | 118.4618  | 39.6904  |        47.7843         |
|      GPT2ForSequenceClassification      |  4  | 92.6563  |  94.7816  | 38.5186  |        38.7433         |
|         Speech2Text2ForCausalLM         | 256 | 54.1175  |  61.0554  | 34.8678  |        33.9134         |
|           LayoutLMForMaskedLM           | 16  | 112.994  | 115.7632  |   nan    |        69.2637         |
|          AllenaiLongformerBase          |  4  | 179.4987 | 271.1081  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9992 |  0.9981   |  3.0512  |         2.9912         |
|         coat_lite_mini          | 128 | 0.9996 |  0.9982   |  1.9594  |         1.9302         |
|      xcit_large_24_p8_224       |  5  | 0.9972 |  0.8746   |  1.9588  |         1.6137         |
|        twins_pcpvt_base         | 64  | 1.0033 |   0.92    |  1.9526  |         1.7062         |
|          gmlp_s16_224           | 128 | 0.9995 |  1.0907   |  1.8542  |         1.8555         |
|          ghostnet_100           | 128 | 0.9986 |  0.7553   |  1.8244  |         1.6649         |
|          gmixer_24_224          | 128 | 0.9997 |  0.8928   |  1.7647  |         1.768          |
|           volo_d1_224           | 64  | 0.9996 |  0.9782   |  1.7182  |         1.6851         |
|         crossvit_9_240          | 128 | 1.0004 |   0.789   |  1.6826  |         1.642          |
|  swin_base_patch4_window7_224   | 64  | 0.9995 |  0.9638   |  1.6599  |         1.6318         |
|           convit_base           | 64  | 0.9999 |  0.9995   |  1.6301  |         1.6161         |
|        adv_inception_v3         | 128 | 0.9996 |  0.8624   |  1.5477  |         1.5291         |
|             dla102              | 128 | 0.9994 |  0.8177   |  1.5475  |         1.534          |
|          inception_v3           | 128 | 0.9997 |  0.8673   |  1.5462  |         1.5305         |
|       gluon_inception_v3        | 128 | 0.9997 |  0.8675   |  1.5385  |         1.5308         |
|          convnext_base          | 64  | 0.9994 |  1.0012   |  1.5297  |         1.5089         |
|            nfnet_l0             | 128 | 0.9991 |  0.8204   |  1.5208  |         1.4551         |
|           dm_nfnet_f0           | 128 | 0.9994 |  0.9979   |  1.519   |         1.4556         |
|            lcnet_050            | 128 | 0.9454 |  0.7382   |  1.5186  |         1.4493         |
|        sebotnet33ts_256         | 64  | 0.9654 |  0.7692   |  1.4882  |         1.5546         |
|            pit_b_224            | 64  | 0.9995 |  0.9975   |  1.4382  |         1.4389         |
|           resnest101e           | 64  | 0.9995 |  0.8693   |  1.4352  |         1.3709         |
|       eca_botnext26ts_256       | 128 | 0.9784 |  0.7217   |  1.433   |         1.4346         |
|           selecsls42b           | 128 | 0.999  |  0.8126   |  1.4268  |         1.4143         |
|           mobilevit_s           | 64  | 0.9713 |  0.7314   |  1.4227  |         1.466          |
|          jx_nest_base           | 32  |  1.0   |  0.9974   |  1.4084  |         1.3809         |
|          botnet26t_256          | 128 | 0.9773 |   0.854   |  1.3956  |         1.4347         |
|          cait_m36_384           |  4  | 1.0011 |  0.9987   |  1.3847  |         1.3551         |
|      mobilenetv3_large_100      | 128 | 0.9528 |  0.7612   |  1.3835  |         1.4301         |
|           res2next50            | 128 | 0.9998 |  0.8268   |  1.3808  |         1.3658         |
|           mnasnet_100           | 128 | 0.9504 |  0.7414   |  1.3779  |         1.5034         |
|      beit_base_patch16_224      | 64  | 0.9993 |  0.9694   |  1.3713  |         1.358          |
|          mixer_b16_224          | 128 | 0.9996 |  1.0209   |  1.3655  |         1.3681         |
|         poolformer_m36          | 64  | 0.9993 |  0.9962   |  1.3608  |         1.3428         |
|        ese_vovnet19b_dw         | 128 | 0.9663 |  0.8377   |  1.3421  |         1.3852         |
|        res2net50_14w_8s         | 128 | 0.9996 |  0.7908   |  1.3345  |         1.3623         |
|         mobilenetv2_100         | 128 | 0.9508 |  0.7385   |  1.3304  |         1.4544         |
|       tf_efficientnet_b0        | 128 | 0.9639 |  0.6827   |  1.3161  |         1.393          |
|           regnety_002           | 128 | 0.9625 |  0.7169   |  1.3014  |         1.2616         |
|           fbnetc_100            | 128 | 0.9514 |  0.7392   |  1.2995  |         1.4058         |
|          spnasnet_100           | 128 | 0.9439 |  0.7395   |  1.2898  |         1.4259         |
|           rexnet_100            | 128 | 0.9606 |  0.7079   |  1.2795  |         1.3531         |
| deit_base_distilled_patch16_224 | 64  | 0.9996 |  0.9971   |  1.2751  |         1.2622         |
|          resmlp_12_224          | 128 |  1.0   |  0.8954   |  1.2698  |         1.2703         |
|            fbnetv3_b            | 128 | 0.9529 |  0.7712   |  1.2695  |         1.3434         |
|      vit_base_patch16_224       | 64  | 0.9992 |  0.9968   |  1.2526  |         1.2409         |
|          cspdarknet53           | 64  | 0.9431 |  0.7926   |  1.2036  |         1.2818         |
|            tinynet_a            | 128 | 0.9508 |   0.681   |  1.186   |         1.2688         |
|         visformer_small         | 128 | 0.9992 |  0.9479   |  1.1849  |         1.1711         |
|           tf_mixnet_l           | 128 | 0.9809 |  0.8296   |  1.1776  |         1.1993         |
|            mixnet_l             | 128 | 0.9799 |  0.8231   |  1.1654  |         1.189          |
|            hrnet_w18            | 128 | 0.9976 |  0.6471   |  1.1413  |         1.3688         |
|        gluon_xception65         | 32  | 0.9994 |  0.8483   |  1.0867  |         1.0875         |
|     swsl_resnext101_32x16d      | 32  | 0.9994 |  0.8436   |  1.0773  |         1.0233         |
|            repvgg_a2            | 128 | 0.9436 |  0.7604   |  1.0662  |         1.1321         |
|             dpn107              | 32  | 0.9416 |  0.8144   |  1.0649  |         1.1479         |
|            gernet_l             | 128 | 0.9443 |  0.7996   |  1.0217  |         1.0798         |
|        convmixer_768_32         | 32  | 0.9996 |  0.9643   |  1.0026  |         1.0041         |
|          pnasnet5large          | 16  | 0.9982 |  0.9212   |  0.994   |         1.1452         |
|        res2net101_26w_4s        | 64  | 1.0011 |  0.7971   |  0.9857  |         1.0974         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|            hrnet_w18            | 128 | 9.6483  |  36.218   | 235.7112 |        255.7989        |
|           rexnet_100            | 128 | 5.6028  |  11.2676  | 231.6876 |        301.4045        |
|          ghostnet_100           | 128 | 7.5065  |  14.9703  | 197.3989 |        246.9963        |
|          pnasnet5large          | 16  | 8.2778  |  26.2982  | 169.3075 |        166.2563        |
|           resnest101e           | 64  | 11.0747 |  24.3857  | 152.5912 |        170.1393        |
|            fbnetv3_b            | 128 | 8.4857  |  17.0989  | 148.6613 |        176.5584        |
|           mobilevit_s           | 64  | 5.4029  |  11.5378  | 143.6318 |        166.7125        |
|        res2net101_26w_4s        | 64  | 10.7493 |  25.0753  | 141.2268 |        155.1759        |
|           tf_mixnet_l           | 128 | 8.9038  |  17.1027  | 140.1769 |        163.8294        |
|        twins_pcpvt_base         | 64  | 10.5623 |  23.6294  | 139.9087 |        149.4916        |
|            mixnet_l             | 128 | 8.4751  |  16.3832  | 136.9195 |        161.6925        |
|      xcit_large_24_p8_224       |  5  | 12.6746 |  28.191   | 135.9468 |        136.3422        |
|      mobilenetv3_large_100      | 128 |  4.24   |  8.4237   | 134.4972 |        156.6946        |
|            tinynet_a            | 128 | 6.0254  |  12.3497  | 133.8497 |        162.1221        |
|        adv_inception_v3         | 128 | 5.6083  |  12.5827  | 132.2084 |        158.3606        |
|          inception_v3           | 128 | 5.6531  |  12.7173  | 129.6631 |        164.1013        |
|       gluon_inception_v3        | 128 | 5.6968  |  12.685   | 128.4666 |        166.2538        |
|       tf_efficientnet_b0        | 128 | 5.1051  |  10.508   | 125.691  |        160.362         |
|        res2net50_14w_8s         | 128 | 8.9698  |  22.6686  | 119.0318 |        124.7547        |
|          cait_m36_384           |  4  | 14.5156 |  30.5109  | 115.6975 |        117.3447        |
|           fbnetc_100            | 128 | 4.9016  |  9.5052   | 115.5414 |        144.4452        |
|          spnasnet_100           | 128 | 5.0166  |  9.3821   | 112.8555 |        141.9405        |
|  swin_base_patch4_window7_224   | 64  | 8.3288  |  19.308   | 109.954  |        109.5663        |
|         mobilenetv2_100         | 128 |  4.038  |  7.8817   | 106.2269 |        130.0432        |
|           mnasnet_100           | 128 | 3.9947  |  7.6597   | 104.4557 |        120.0916        |
|         poolformer_m36          | 64  | 7.6527  |  13.818   | 98.2776  |        104.1984        |
|        sebotnet33ts_256         | 64  |  4.177  |  9.0216   | 97.6094  |        111.8223        |
|             dpn107              | 32  | 9.6537  |  20.6581  | 94.5859  |        98.4992         |
|           regnety_002           | 128 | 4.8363  |  8.8369   | 93.5764  |        105.1349        |
|             dla102              | 128 | 6.2429  |  14.103   | 88.6453  |        97.4249         |
|        gluon_xception65         | 32  | 7.8143  |  16.712   | 88.5282  |        97.2486         |
|          cspdarknet53           | 64  | 5.7605  |  10.9952  | 86.5536  |         99.951         |
|         coat_lite_mini          | 128 | 3.2652  |  7.8624   | 86.3051  |         89.376         |
|       eca_botnext26ts_256       | 128 | 3.0786  |  6.8707   | 84.7318  |        99.0896         |
|         crossvit_9_240          | 128 | 5.8478  |  13.3432  | 84.0164  |        88.2013         |
|          jx_nest_base           | 32  | 6.5941  |  14.9499  | 82.8483  |         85.415         |
|            lcnet_050            | 128 | 2.5238  |  5.0104   | 80.9058  |        97.8475         |
|           res2next50            | 128 | 5.1141  |  12.0081  | 80.7782  |        89.1207         |
|          botnet26t_256          | 128 | 2.9037  |  5.8744   | 80.3098  |        94.7225         |
|           selecsls42b           | 128 | 2.4744  |  5.4448   |  76.28   |        91.2333         |
|           volo_d1_224           | 64  | 5.0343  |  11.8314  | 73.9246  |        75.7124         |
|            nfnet_l0             | 128 |  5.175  |  10.9618  | 71.8915  |        79.3345         |
|            gernet_l             | 128 | 4.9834  |   8.906   | 71.2782  |         83.696         |
|        tnt_s_patch16_224        | 128 | 6.5198  |  15.9523  | 71.2131  |         70.363         |
|           dm_nfnet_f0           | 128 | 6.0279  |  11.475   | 67.5497  |        72.8735         |
|        ese_vovnet19b_dw         | 128 | 2.5345  |  4.6376   | 67.3339  |        78.0503         |
|         visformer_small         | 128 | 2.6613  |  6.0653   | 63.2856  |        68.6746         |
|     swsl_resnext101_32x16d      | 32  | 6.0744  |  13.7432  | 62.9458  |         64.117         |
|          gmlp_s16_224           | 128 | 5.6199  |  12.0394  | 59.9245  |        61.1832         |
|          convnext_base          | 64  | 6.6243  |  12.5915  | 58.5951  |        57.5501         |
|            repvgg_a2            | 128 | 4.7332  |   8.802   | 57.3375  |        62.1074         |
|          gmixer_24_224          | 128 | 5.6766  |  12.8393  | 52.7501  |        52.2961         |
|           convit_base           | 64  | 3.5126  |  8.6451   | 49.3916  |        48.1918         |
|            pit_b_224            | 64  | 3.3812  |  7.9797   | 47.5132  |        47.3676         |
| deit_base_distilled_patch16_224 | 64  | 3.1288  |  7.1423   | 42.5146  |        42.7039         |
|          resmlp_12_224          | 128 | 2.8336  |  5.2466   | 41.9506  |        41.8507         |
|      vit_base_patch16_224       | 64  | 3.0835  |  7.0436   | 41.5835  |        40.4003         |
|        convmixer_768_32         | 32  | 1.6862  |  6.9101   | 37.8643  |        35.8127         |
|      beit_base_patch16_224      | 64  | 3.9134  |  8.6349   | 37.6453  |         37.056         |
|          mixer_b16_224          | 128 | 2.6945  |  5.9256   | 34.4062  |        34.2138         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.2872  |         1.2836         |
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.2057  |         1.2049         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  1.1899  |         1.1871         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1607  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.1583  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.1215  |         1.1179         |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  1.1129  |         1.1115         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  1.089   |         1.0876         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.0875  |         1.0845         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  1.0758  |         1.0721         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  1.0757  |         1.0728         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  1.0696  |         1.0675         |
|        twins_pcpvt_base         | 64  | 0.996  |  0.9232   |  1.0556  |         1.0539         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  1.0512  |         1.0506         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  1.0494  |         1.0457         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0377  |         1.0351         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  1.0361  |         1.0328         |
|          convnext_base          | 64  | 1.001  |   0.924   |  1.0346  |         1.0338         |
|             dla102              | 128 | 0.9635 |  0.9155   |  1.0323  |         1.0326         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  1.0251  |         1.0242         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  1.021   |         1.0202         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  1.0203  |         1.0194         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  1.0193  |         1.0171         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  1.0082  |         1.0072         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  1.0071  |         1.0057         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9976  |         0.9952         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9957  |         0.9948         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.9925  |          0.99          |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.9923  |         0.9902         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9917  |         0.9903         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.9912  |         0.9898         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9905  |         0.989          |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.9885  |         0.989          |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9864  |         0.9854         |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9821  |         0.9793         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.9793  |         0.9786         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.9793  |         0.977          |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.979   |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.9776  |         0.9732         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.9738  |         0.9706         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9732  |         0.9727         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.9714  |         0.9705         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.9702  |         0.9664         |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.966   |         0.9611         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.9646  |         0.9642         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.9637  |         0.9607         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.9611  |         0.9604         |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.9582  |         0.9535         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.9568  |         0.9547         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9562  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9537  |         0.9528         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.9509  |         0.9483         |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.9497  |         0.9451         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.9448  |         0.9403         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.9376  |         0.9361         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.9046  |         0.9045         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.9009  |         0.8966         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.8898  |         0.884          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.2057 | 311.5098  | 299.5147 |        298.951         |
|            hrnet_w18            | 128 | 281.2618 | 433.6146  | 244.7834 |        204.3659        |
|          pnasnet5large          | 16  | 196.9029 | 213.0557  | 197.3134 |        171.775         |
|           tf_mixnet_l           | 128 | 193.0619 | 228.5642  | 160.7712 |        158.0825        |
|            mixnet_l             | 128 | 185.1797 | 220.3961  | 155.3913 |        152.2922        |
|          cait_m36_384           |  4  | 172.2784 | 167.1274  | 120.3581 |        123.6539        |
|           resnest101e           | 64  | 164.5945 | 189.1554  | 114.2972 |        119.4303        |
|             dla102              | 128 | 172.425  | 210.7345  | 111.3539 |         112.34         |
|     swsl_resnext101_32x16d      | 32  | 118.4815 | 140.3454  | 109.9714 |        116.0543        |
|         poolformer_m36          | 64  | 145.5441 | 145.6046  | 106.4205 |        108.0641        |
|        tnt_s_patch16_224        | 128 | 324.6003 | 324.8079  | 106.204  |        108.3226        |
|        res2net50_14w_8s         | 128 | 140.9692 | 178.5624  | 105.4175 |        103.2289        |
|       gluon_inception_v3        | 128 | 160.7346 | 185.3073  | 104.4396 |        104.9896        |
|          inception_v3           | 128 | 160.4437 | 184.9205  | 103.7148 |        104.8557        |
|        adv_inception_v3         | 128 | 160.5939 | 186.3097  | 103.6977 |        104.9615        |
|        res2net101_26w_4s        | 64  | 100.2628 |  125.96   | 100.4395 |         89.702         |
|           convit_base           | 64  | 163.1847 | 163.1645  | 100.0348 |        100.8871        |
|             dpn107              | 32  | 112.7473 | 130.2638  |  99.755  |        92.6423         |
|           res2next50            | 128 | 126.1378 |  152.483  | 91.2519  |        92.3747         |
|        gluon_xception65         | 32  | 99.2852  | 116.7245  | 91.1175  |        91.2211         |
|  swin_base_patch4_window7_224   | 64  | 146.4853 | 151.7706  | 88.0995  |        89.7293         |
|            fbnetv3_b            | 128 | 115.2423 | 142.2562  | 86.3215  |        81.6254         |
|          mixer_b16_224          | 128 | 116.7642 | 114.2732  |  85.701  |        85.2705         |
|           dm_nfnet_f0           | 128 | 127.261  | 127.4768  | 83.3806  |        87.3375         |
|            pit_b_224            | 64  | 118.3422 | 118.4928  | 82.2604  |        82.1459         |
|          convnext_base          | 64  | 122.4502 |  122.286  | 79.9757  |        81.1102         |
|         visformer_small         | 128 | 91.0501  |  96.0674  | 76.9245  |        77.7919         |
|          gmlp_s16_224           | 128 | 137.9937 | 126.1744  | 74.1996  |        74.1208         |
|       eca_botnext26ts_256       | 128 | 108.4551 | 146.8868  | 74.1488  |         73.906         |
|      beit_base_patch16_224      | 64  | 101.3318 | 104.4735  |  73.879  |        74.5494         |
|          cspdarknet53           | 64  | 94.0075  | 112.0631  | 73.7063  |        69.2328         |
|            nfnet_l0             | 128 | 112.0439 | 136.7268  | 73.6575  |        76.9705         |
|          jx_nest_base           | 32  |  100.82  | 100.7255  | 71.3251  |        72.7712         |
|            gernet_l             | 128 | 77.0563  |  90.9975  | 71.2971  |        67.3974         |
|          botnet26t_256          | 128 | 101.5616 | 116.4635  |  71.286  |        69.2309         |
|           volo_d1_224           | 64  | 120.8655 | 123.2799  | 70.0879  |        71.5968         |
|      vit_base_patch16_224       | 64  | 86.8725  |  86.9419  |  69.22   |        69.8804         |
|            repvgg_a2            | 128 | 77.1222  |  95.756   | 68.1488  |        64.2391         |
|          gmixer_24_224          | 128 | 118.0402 | 132.0607  | 67.0084  |        66.6136         |
| deit_base_distilled_patch16_224 | 64  | 84.7561  |  84.8431  | 66.3612  |         67.129         |
|      xcit_large_24_p8_224       |  5  | 125.3616 | 140.1537  | 62.8021  |        76.4511         |
|       tf_efficientnet_b0        | 128 | 84.6126  | 119.3478  |  61.908  |        58.5843         |
|           fbnetc_100            | 128 | 82.7837  | 106.8648  | 60.5759  |        56.0905         |
|           rexnet_100            | 128 | 79.6315  | 108.1219  | 59.5236  |        56.4518         |
|        twins_pcpvt_base         | 64  | 119.379  | 128.8684  | 59.4449  |        68.2577         |
|            tinynet_a            | 128 | 73.5976  | 102.2242  | 58.6474  |        54.9239         |
|         coat_lite_mini          | 128 | 112.837  | 113.1294  | 57.5346  |        58.4786         |
|           mobilevit_s           | 64  | 84.0801  |  111.659  | 57.2363  |        55.6318         |
|        sebotnet33ts_256         | 64  | 80.0104  | 100.3667  | 51.8248  |        49.6282         |
|          spnasnet_100           | 128 | 70.3203  |  89.8446  | 51.4029  |        46.6046         |
|          ghostnet_100           | 128 |  90.385  | 119.5644  | 49.3772  |        54.2214         |
|         crossvit_9_240          | 128 | 81.9877  | 104.0409  | 48.6429  |        49.8672         |
|         mobilenetv2_100         | 128 | 65.6427  |  84.3872  | 46.7598  |        42.8267         |
|        ese_vovnet19b_dw         | 128 | 64.1537  |  74.0992  | 46.2267  |        44.8419         |
|           mnasnet_100           | 128 | 64.3192  |  82.4117  | 44.2831  |        40.6108         |
|           selecsls42b           | 128 | 60.1316  |  73.9077  | 42.1099  |        42.5318         |
|      mobilenetv3_large_100      | 128 | 61.3626  |  76.6671  |  42.072  |        40.8052         |
|          resmlp_12_224          | 128 | 53.3004  |  59.4257  | 41.9834  |         41.863         |
|           regnety_002           | 128 | 38.3973  |  52.846   |  28.348  |        30.0116         |
|            lcnet_050            | 128 | 31.6616  |  40.4925  |  19.669  |        20.6265         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

Build Summary

see more

Run name

day_099_09_04_23_performance_amp_108

Commit hashes

pytorch commit: 5842444
pytorch commit date: 2023-04-10 01:48:31+00:00
torchbench commit: 137c3f0e68280ab41c94403464058621a7c7fae1
torchbench commit date: 2023-04-08 04:29:31-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git5842444

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.62x    |    1.65x    |    1.42x    |
| inductor_no_cudagraphs |   1.30x    |    1.54x    |    1.40x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.77    |    7.23     |    5.81     |
|       aot_eager        |    9.12    |    15.45    |    13.03    |
|        inductor        |   62.42    |    62.34    |   109.03    |
| inductor_no_cudagraphs |   62.26    |    59.78    |   108.48    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.78x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Previous report name: /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 83%, 50/60  | 85%, 51/60  |
|        inductor        | huggingface | 93%, 42/45  | 91%, 41/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.50x    |   1.62x   |
|        inductor        | huggingface |   1.62x    |   1.65x   |
|        inductor        | timm_models |   1.39x    |   1.42x   |
| inductor_no_cudagraphs | torchbench  |   1.29x    |   1.30x   |
| inductor_no_cudagraphs | huggingface |   1.54x    |   1.54x   |
| inductor_no_cudagraphs | timm_models |   1.41x    |   1.40x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+------------------------+-----------------+
|    suite    |             name              | inductor_no_cudagraphs |    inductor     |
+-------------+-------------------------------+------------------------+-----------------+
| torchbench  |         hf_Longformer         |      fail_to_run       |   fail_to_run   |
| torchbench  |             moco              |      fail_to_run       |   fail_to_run   |
| torchbench  |      Background_Matting       |    eager_variation     | eager_variation |
| torchbench  |           tacotron2           |         0.0000         |     0.0000      |
| torchbench  |              gat              |         0.0000         |     0.0000      |
| torchbench  |              gcn              |         0.0000         |     0.0000      |
| torchbench  |             llama             |         0.0000         |     0.0000      |
| torchbench  |             sage              |         0.0000         |     0.0000      |
| torchbench  |         torchrec_dlrm         |         0.0000         |     0.0000      |
| huggingface | DebertaV2ForQuestionAnswering |          pass          |   fail_to_run   |
| huggingface |  AlbertForQuestionAnswering   |     fail_accuracy      |  fail_accuracy  |
+-------------+-------------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+-------------------------------+------------------------+----------+
|    suite    |             name              | inductor_no_cudagraphs | inductor |
+-------------+-------------------------------+------------------------+----------+
| torchbench  |             dcgan             |         0.8345         |  1.4663  |
| torchbench  |         lennard_jones         |         0.9158         |  1.3892  |
| torchbench  |       soft_actor_critic       |         0.7376         |  1.0378  |
| torchbench  |          tts_angular          |         0.9515         |  0.9442  |
| torchbench  |    nvidia_deeprecommender     |         1.0187         |  0.8724  |
| torchbench  | timm_vision_transformer_large |         1.0834         |   0.0    |
| torchbench  |         hf_Longformer         |          0.0           |   0.0    |
| torchbench  |             moco              |          0.0           |   0.0    |
| torchbench  |              gat              |          0.0           |   0.0    |
| torchbench  |              gcn              |          0.0           |   0.0    |
| torchbench  |             sage              |          0.0           |   0.0    |
| torchbench  |           tacotron2           |          0.0           |   0.0    |
| torchbench  |         torchrec_dlrm         |          0.0           |   0.0    |
| huggingface |     DebertaV2ForMaskedLM      |         0.7427         |  1.0163  |
| huggingface | DebertaV2ForQuestionAnswering |          0.77          |  0.9877  |
| huggingface |     BlenderbotForCausalLM     |         1.4055         |   0.0    |
| huggingface |     AllenaiLongformerBase     |          0.0           |   0.0    |
+-------------+-------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |          hf_T5_large           |        173.3623        | 173.1003 |
| torchbench  |        phlippe_densenet        |        162.5231        | 170.3835 |
| torchbench  |           hf_BigBird           |        128.9725        | 148.276  |
| torchbench  |       timm_efficientnet        |        140.1202        | 140.5327 |
| torchbench  |          densenet121           |        136.2243        | 136.6661 |
| torchbench  |       mobilenet_v3_large       |        138.3727        | 136.0634 |
| torchbench  |          mobilenet_v2          |        130.3093        | 129.1956 |
| torchbench  | timm_vision_transformer_large  |        123.614         |   nan    |
| huggingface | DebertaV2ForQuestionAnswering  |        72.3895         | 733.502  |
| huggingface |     MobileBertForMaskedLM      |        148.2348        | 145.9296 |
| huggingface |      DebertaV2ForMaskedLM      |        70.7588         | 140.7713 |
| huggingface | MobileBertForQuestionAnswering |        141.5091        | 140.1224 |
| huggingface | M2M100ForConditionalGeneration |        135.0586        | 134.7268 |
| huggingface |  MT5ForConditionalGeneration   |        134.0228        | 132.282  |
| huggingface |        XGLMForCausalLM         |        132.5058        | 131.1217 |
| timm_models |           rexnet_100           |        284.5444        | 281.5342 |
| timm_models |           hrnet_w18            |        246.356         | 254.0037 |
| timm_models |          ghostnet_100          |        235.0013        | 237.587  |
| timm_models |           fbnetv3_b            |        170.2323        | 170.3873 |
| timm_models |         pnasnet5large          |        163.9132        | 165.0816 |
| timm_models |          resnest101e           |        165.2112        | 164.3654 |
| timm_models |          mobilevit_s           |        157.6558        | 160.6709 |
| timm_models |          inception_v3          |        161.5501        | 160.4699 |
| timm_models |        adv_inception_v3        |        162.0153        | 157.7377 |
| timm_models |           tinynet_a            |        157.8782        | 157.5875 |
| timm_models |     mobilenetv3_large_100      |        157.4246        | 157.1059 |
| timm_models |       gluon_inception_v3       |        158.796         | 155.9479 |
| timm_models |          tf_mixnet_l           |        156.1956        | 152.3374 |
| timm_models |       res2net101_26w_4s        |        145.8825        | 152.0736 |
| timm_models |       tf_efficientnet_b0       |        146.9729        | 150.6486 |
| timm_models |            mixnet_l            |        155.4344        | 149.9561 |
| timm_models |        twins_pcpvt_base        |        144.3253        | 149.3229 |
| timm_models |          spnasnet_100          |        137.0462        | 142.9941 |
| timm_models |           fbnetc_100           |        136.2885        | 140.8511 |
| timm_models |      xcit_large_24_p8_224      |        130.8808        | 133.6231 |
| timm_models |        mobilenetv2_100         |        125.8879        | 129.0996 |
| timm_models |        res2net50_14w_8s        |        122.4182        | 123.8116 |
| timm_models |          mnasnet_100           |        123.6937        | 123.3497 |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |                 yolov3                  |         1.036          |  0.8919  |
| torchbench  |              hf_GPT2_large              |         1.128          |  0.8904  |
| torchbench  |           speech_transformer            |         0.869          |  0.8651  |
| torchbench  |              timm_resnest               |         0.9665         |  0.8635  |
| torchbench  |               Super_SloMo               |         1.208          |  0.8614  |
| torchbench  |           shufflenet_v2_x1_0            |         0.9658         |  0.8613  |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8593  |
| torchbench  |               timm_regnet               |         0.953          |  0.8506  |
| torchbench  |                resnet152                |         0.9405         |  0.8499  |
| torchbench  |           Background_Matting            |         1.0403         |  0.8485  |
| torchbench  |              hf_DistilBert              |         0.9945         |  0.8476  |
| torchbench  |                 hf_Bert                 |         1.0258         |  0.8411  |
| torchbench  |              hf_Bert_large              |         1.0725         |  0.8302  |
| torchbench  |               hf_T5_large               |         1.168          |  0.8201  |
| torchbench  |              pytorch_unet               |         0.9308         |  0.8134  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8058  |
| torchbench  |                 hf_Bart                 |         0.9173         |  0.7933  |
| torchbench  |           mobilenet_v3_large            |         0.7757         |  0.7842  |
| torchbench  |                resnet50                 |         0.8851         |  0.7831  |
| torchbench  |                  dcgan                  |         0.9645         |  0.7821  |
| torchbench  |                 demucs                  |         0.9655         |  0.773   |
| torchbench  |              squeezenet1_1              |         0.908          |  0.773   |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.7715  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.7529  |
| torchbench  |                  vgg16                  |         0.9808         |  0.7227  |
| torchbench  |               mnasnet1_0                |         0.8062         |  0.7159  |
| torchbench  |               densenet121               |         0.7998         |  0.7096  |
| torchbench  |                 alexnet                 |         0.939          |  0.7091  |
| torchbench  |             pytorch_struct              |         0.7362         |  0.697   |
| torchbench  |               hf_BigBird                |         1.1191         |  0.6949  |
| torchbench  |             resnext50_32x4d             |         0.7699         |  0.6677  |
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.6585  |
| torchbench  |                   drq                   |         0.9573         |  0.6379  |
| torchbench  |            soft_actor_critic            |         0.9973         |  0.6066  |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6172         |  0.6065  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.5925  |
| torchbench  |                resnet18                 |         0.6097         |  0.5395  |
| torchbench  |              lennard_jones              |         0.9997         |  0.5317  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.4538  |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.3991  |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3169  |
| huggingface |           ElectraForCausalLM            |         0.9739         |  0.8941  |
| huggingface |           PegasusForCausalLM            |         0.9864         |  0.893   |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8849  |
| huggingface |            TrOCRForCausalLM             |         0.9583         |  0.8836  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8729  |
| huggingface |     PegasusForConditionalGeneration     |         1.0689         |  0.8689  |
| huggingface |      MBartForConditionalGeneration      |         1.0307         |  0.8672  |
| huggingface |      BartForConditionalGeneration       |         1.0139         |  0.8456  |
| huggingface |         MegatronBertForCausalLM         |         1.0962         |  0.845   |
| huggingface |       BlenderbotSmallForCausalLM        |         0.9119         |  0.8184  |
| huggingface |         Speech2Text2ForCausalLM         |         0.8779         |  0.789   |
| huggingface |     M2M100ForConditionalGeneration      |         0.9908         |  0.7651  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.7473  |
| huggingface |             XGLMForCausalLM             |         0.9792         |  0.7117  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6569  |
| huggingface |           DebertaForMaskedLM            |         0.9988         |  0.5646  |
| huggingface |          DebertaV2ForMaskedLM           |         0.9664         |  0.5187  |
| huggingface |       DebertaForQuestionAnswering       |         1.1525         |  0.4867  |
| huggingface |      DebertaV2ForQuestionAnswering      |         0.9798         |  0.4855  |
| timm_models |                hrnet_w18                |          0.99          |  0.8918  |
| timm_models |            sebotnet33ts_256             |         1.1115         |  0.891   |
| timm_models |            adv_inception_v3             |         1.0171         |  0.8904  |
| timm_models |           gluon_inception_v3            |         1.0171         |  0.8904  |
| timm_models |              inception_v3               |         1.0171         |  0.8904  |
| timm_models |                 dpn107                  |         0.9642         |  0.8833  |
| timm_models |            gluon_xception65             |         0.9705         |  0.8831  |
| timm_models |              ghostnet_100               |         0.977          |  0.8807  |
| timm_models |              spnasnet_100               |         0.9451         |  0.8786  |
| timm_models |          mobilenetv3_large_100          |         0.9361         |  0.877   |
| timm_models |             poolformer_m36              |         1.1871         |  0.8768  |
| timm_models |           eca_botnext26ts_256           |         1.0072         |  0.8738  |
| timm_models |            res2net50_14w_8s             |         0.9607         |  0.8712  |
| timm_models |            res2net101_26w_4s            |         0.9483         |  0.871   |
| timm_models |                mixnet_l                 |         0.9902         |  0.8687  |
| timm_models |               mnasnet_100               |         0.9403         |  0.8683  |
| timm_models |               res2next50                |         0.9547         |  0.866   |
| timm_models |              cait_m36_384               |         0.989          |  0.8632  |
| timm_models |               fbnetc_100                |         0.9535         |  0.8596  |
| timm_models |                pit_b_224                |         1.0242         |  0.8578  |
| timm_models |               selecsls42b               |         0.9664         |  0.8576  |
| timm_models |              convnext_base              |         1.0338         |  0.8505  |
| timm_models |                gernet_l                 |         0.9706         |  0.8499  |
| timm_models |         swsl_resnext101_32x16d          |         0.9786         |  0.8461  |
| timm_models |             coat_lite_mini              |         1.0202         |  0.8402  |
| timm_models |              botnet26t_256              |         0.9779         |  0.8239  |
| timm_models |          xcit_large_24_p8_224           |         0.9732         |  0.8225  |
| timm_models |                lcnet_050                |         0.884          |  0.805   |
| timm_models |                repvgg_a2                |         0.9611         |  0.7738  |
| timm_models |               regnety_002               |         0.8966         |  0.7602  |
| timm_models |             crossvit_9_240              |         0.9898         |  0.7526  |
| timm_models |      swin_base_patch4_window7_224       |         0.9045         |  0.7214  |
| timm_models |              jx_nest_base               |         0.9604         |  0.6693  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/geomean_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/passrate_over_time.png :

bench_logs/comp_time_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Performance speedup regressions

+----------+------------------------+-------------+------------+
| compiler |          name          | prev_status | cur_status |
+----------+------------------------+-------------+------------+
| inductor | nvidia_deeprecommender |   0.9785    |   0.8724   |
+----------+------------------------+-------------+------------+

Compilation latency (sec) regressions

+----------+--------------------+-------------+------------+
| compiler |        name        | prev_status | cur_status |
+----------+--------------------+-------------+------------+
| inductor | timm_efficientnet  |  119.6911   |  140.5327  |
| inductor | mobilenet_v3_large |  110.5632   |  136.0634  |
| inductor |    mobilenet_v2    |  105.2505   |  129.1956  |
+----------+--------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+------------------------+-------------+------------+
| compiler |          name          | prev_status | cur_status |
+----------+------------------------+-------------+------------+
| inductor |         yolov3         |   1.0161    |   0.8919   |
| inductor |     hf_GPT2_large      |   1.1334    |   0.8904   |
| inductor |      timm_resnest      |   0.9713    |   0.8635   |
| inductor |      Super_SloMo       |    1.208    |   0.8614   |
| inductor |   shufflenet_v2_x1_0   |   0.9736    |   0.8613   |
| inductor |      timm_regnet       |   0.9552    |   0.8506   |
| inductor |       resnet152        |   0.9444    |   0.8499   |
| inductor |   Background_Matting   |   1.0427    |   0.8485   |
| inductor |     hf_DistilBert      |   1.0292    |   0.8476   |
| inductor |        hf_Bert         |   1.0344    |   0.8411   |
| inductor |     hf_Bert_large      |   1.0737    |   0.8302   |
| inductor |      hf_T5_large       |   1.1687    |   0.8201   |
| inductor |      pytorch_unet      |   0.9306    |   0.8134   |
| inductor |        hf_Bart         |    0.978    |   0.7933   |
| inductor |         dcgan          |   0.9644    |   0.7821   |
| inductor |     squeezenet1_1      |    0.909    |   0.773    |
| inductor |         demucs         |   0.9866    |   0.773    |
| inductor |         vgg16          |   0.9823    |   0.7227   |
| inductor |        alexnet         |   0.9434    |   0.7091   |
| inductor |       hf_BigBird       |   1.1425    |   0.6949   |
| inductor | nvidia_deeprecommender |   0.9195    |   0.6585   |
| inductor |          drq           |   1.0607    |   0.6379   |
| inductor |   soft_actor_critic    |   1.1053    |   0.6066   |
| inductor |     lennard_jones      |   1.0687    |   0.5317   |
+----------+------------------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Accuracy regressions

+----------+-------------------------------+-------------+-------------+
| compiler |             name              | prev_status | cur_status  |
+----------+-------------------------------+-------------+-------------+
| inductor | DebertaV2ForQuestionAnswering |    pass     | fail_to_run |
+----------+-------------------------------+-------------+-------------+

Performance speedup regressions

+----------+-----------------------+-------------+------------+
| compiler |         name          | prev_status | cur_status |
+----------+-----------------------+-------------+------------+
| inductor | BlenderbotForCausalLM |   1.3145    |    0.0     |
+----------+-----------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+---------------------------------+-------------+------------+
| compiler |              name               | prev_status | cur_status |
+----------+---------------------------------+-------------+------------+
| inductor |       ElectraForCausalLM        |   0.9753    |   0.8941   |
| inductor |       PegasusForCausalLM        |   0.9161    |   0.893    |
| inductor | PegasusForConditionalGeneration |   1.0074    |   0.8689   |
| inductor |  MBartForConditionalGeneration  |    1.004    |   0.8672   |
| inductor |  BartForConditionalGeneration   |    0.988    |   0.8456   |
| inductor |     MegatronBertForCausalLM     |   1.0827    |   0.845    |
| inductor | M2M100ForConditionalGeneration  |   0.9321    |   0.7651   |
| inductor |      MobileBertForMaskedLM      |   0.9505    |   0.7473   |
| inductor |         XGLMForCausalLM         |   0.9264    |   0.7117   |
| inductor |       DebertaForMaskedLM        |   1.0326    |   0.5646   |
| inductor |      DebertaV2ForMaskedLM       |   0.9899    |   0.5187   |
| inductor |   DebertaForQuestionAnswering   |   1.2169    |   0.4867   |
| inductor |  DebertaV2ForQuestionAnswering  |   1.0023    |   0.4855   |
+----------+---------------------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108

Compilation latency (sec) regressions

+----------+------------------+-------------+------------+
| compiler |       name       | prev_status | cur_status |
+----------+------------------+-------------+------------+
| inductor |   spnasnet_100   |  112.8555   |  142.9941  |
| inductor |    fbnetc_100    |  115.5414   |  140.8511  |
| inductor | mobilenetv2_100  |  106.2269   |  129.0996  |
| inductor | res2net50_14w_8s |  119.0318   |  123.8116  |
| inductor |   mnasnet_100    |  104.4557   |  123.3497  |
+----------+------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+------------------------------+-------------+------------+
| compiler |             name             | prev_status | cur_status |
+----------+------------------------------+-------------+------------+
| inductor |          hrnet_w18           |   0.9925    |   0.8918   |
| inductor |       sebotnet33ts_256       |   1.1129    |   0.891    |
| inductor |       adv_inception_v3       |   1.0193    |   0.8904   |
| inductor |      gluon_inception_v3      |   1.0193    |   0.8904   |
| inductor |         inception_v3         |   1.0193    |   0.8904   |
| inductor |            dpn107            |   0.9646    |   0.8833   |
| inductor |       gluon_xception65       |   0.9714    |   0.8831   |
| inductor |         ghostnet_100         |   0.9793    |   0.8807   |
| inductor |         spnasnet_100         |   0.9497    |   0.8786   |
| inductor |    mobilenetv3_large_100     |   0.9376    |   0.877    |
| inductor |        poolformer_m36        |   1.1899    |   0.8768   |
| inductor |     eca_botnext26ts_256      |   1.0082    |   0.8738   |
| inductor |       res2net50_14w_8s       |   0.9637    |   0.8712   |
| inductor |      res2net101_26w_4s       |   0.9509    |   0.871    |
| inductor |           mixnet_l           |   0.9923    |   0.8687   |
| inductor |         mnasnet_100          |   0.9448    |   0.8683   |
| inductor |          res2next50          |   0.9568    |   0.866    |
| inductor |         cait_m36_384         |   0.9885    |   0.8632   |
| inductor |          fbnetc_100          |   0.9582    |   0.8596   |
| inductor |          pit_b_224           |   1.0251    |   0.8578   |
| inductor |         selecsls42b          |   0.9702    |   0.8576   |
| inductor |        convnext_base         |   1.0346    |   0.8505   |
| inductor |           gernet_l           |   0.9738    |   0.8499   |
| inductor |    swsl_resnext101_32x16d    |   0.9793    |   0.8461   |
| inductor |        coat_lite_mini        |    1.021    |   0.8402   |
| inductor |        botnet26t_256         |    0.979    |   0.8239   |
| inductor |     xcit_large_24_p8_224     |   0.9776    |   0.8225   |
| inductor |          repvgg_a2           |    0.966    |   0.7738   |
| inductor |         regnety_002          |   0.9009    |   0.7602   |
| inductor |        crossvit_9_240        |   0.9912    |   0.7526   |
| inductor | swin_base_patch4_window7_224 |   0.9046    |   0.7214   |
| inductor |         jx_nest_base         |   0.9611    |   0.6693   |
+----------+------------------------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9672 |  0.9258   |  3.7347  |         1.399          |
|           BERT_pytorch            |  16  | 1.0055 |  0.8063   |  3.1384  |         2.2119         |
|            densenet121            |  4   | 0.9932 |  0.7214   |  2.7395  |         1.0908         |
|            hf_BigBird             |  2   | 0.9597 |  0.7733   |  2.6233  |         1.6975         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9772 |  0.8864   |  2.5016  |         1.8428         |
|            hf_T5_large            |  2   | 1.0098 |  0.8258   |  2.4177  |         2.0127         |
|             hf_Albert             |  8   | 0.9936 |  0.9607   |  2.3941  |         2.3022         |
|              hf_Bart              |  4   | 0.9959 |   0.84    |  2.3057  |         1.5948         |
|         phlippe_densenet          | 128  |  1.0   |  0.7797   |  2.1182  |         1.0379         |
|        mobilenet_v3_large         |  32  | 1.0008 |  0.7829   |  2.0832  |         1.2378         |
|           squeezenet1_1           |  32  | 0.9821 |  0.9365   |  2.0466  |         1.358          |
|               dlrm                | 1024 | 0.9365 |  0.8444   |  2.0262  |         1.2226         |
|               hf_T5               |  8   | 0.9947 |   0.861   |  1.9658  |         2.0442         |
|          pytorch_struct           | 200  | 0.9267 |  0.7822   |  1.946   |         1.1337         |
|              hf_GPT2              |  4   | 1.0219 |  0.9827   |  1.9042  |         1.9054         |
|              hf_Bert              |  4   | 1.0267 |  0.8644   |  1.8927  |         1.7196         |
|          phlippe_resnet           | 128  | 0.9949 |  0.7618   |  1.8334  |         1.0146         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.8488   |  1.8054  |         1.4157         |
|          resnext50_32x4d          |  8   | 0.9872 |  0.7169   |  1.7296  |         1.0136         |
|           hf_GPT2_large           |  4   | 1.0002 |  0.9892   |  1.7272  |         1.799          |
|            mnasnet1_0             |  32  | 0.9981 |  0.7382   |  1.7013  |         1.0451         |
|        speech_transformer         |  32  | 0.983  |  0.7903   |  1.6902  |         1.6686         |
|        shufflenet_v2_x1_0         | 128  | 0.9971 |  0.7413   |  1.6901  |         1.2191         |
|           hf_Bert_large           |  4   | 1.0295 |  0.8908   |  1.6233  |         1.6546         |
| attention_is_all_you_need_pytorch | 256  | 1.003  |  0.9017   |  1.6141  |         1.5237         |
|             resnet18              |  16  | 0.9903 |  0.7709   |  1.6118  |         1.0032         |
|           timm_resnest            |  32  | 0.998  |  0.8524   |  1.583   |         1.5297         |
|            timm_nfnet             | 128  | 1.0005 |   0.998   |  1.5627  |         1.4999         |
|           fastNLP_Bert            |  6   | 1.0118 |  0.8682   |  1.5495  |         1.5375         |
|           mobilenet_v2            |  96  | 0.9989 |  0.7791   |  1.5288  |         1.5316         |
|                drq                |  1   | 0.9483 |  0.7485   |  1.5257  |         1.015          |
|           hf_DistilBert           |  8   | 1.0119 |  0.9427   |  1.5055  |         1.5041         |
|               dcgan               |  32  | 0.8486 |  0.6842   |  1.4663  |         0.8345         |
|         timm_efficientnet         |  32  | 0.946  |  0.6293   |  1.4515  |         1.1063         |
|           lennard_jones           | 1000 | 0.8549 |  0.7393   |  1.3892  |         0.9158         |
|           pytorch_unet            |  1   | 0.9983 |  0.2037   |  1.3629  |         1.3569         |
|          LearningToPaint          |  96  | 0.9875 |  0.7874   |  1.3069  |         1.0864         |
|          pytorch_stargan          |  16  | 0.9969 |  0.8023   |  1.274   |         1.2502         |
|            Super_SloMo            |  6   | 0.9987 |  0.1778   |  1.2551  |         1.2347         |
|               vgg16               |  64  | 0.9994 |  0.9985   |  1.2406  |         1.2544         |
|             resnet152             |  32  | 1.0005 |  0.7717   |  1.2289  |         1.0282         |
|        Background_Matting         |  4   | 0.9993 |  0.1369   |  1.2126  |         1.209          |
|              yolov3               |  16  | 0.9996 |  0.8085   |  1.2018  |         1.2045         |
|             resnet50              |  32  | 1.0001 |  0.7692   |  1.1973  |         1.0921         |
|            hf_Reformer            |  4   | 0.9842 |  0.9686   |  1.1469  |         1.0697         |
|              alexnet              | 128  | 0.9982 |  0.9971   |  1.0898  |         1.1369         |
|            timm_regnet            |  32  | 0.932  |  0.7908   |  1.0592  |         1.0205         |
|         soft_actor_critic         | 256  | 0.8455 |  0.6234   |  1.0378  |         0.7376         |
|              demucs               |  4   | 0.9998 |  1.0018   |  1.0339  |         1.0382         |
|            timm_vovnet            |  32  |  0.88  |  0.7076   |  0.9818  |         0.964          |
|            tts_angular            |  64  | 0.9132 |  0.8842   |  0.9442  |         0.9515         |
|      nvidia_deeprecommender       | 256  | 0.999  |  0.9983   |  0.8724  |         1.0187         |
|   timm_vision_transformer_large   |  32  |  1.0   |    0.0    |   0.0    |         1.0834         |
|           hf_Longformer           |  2   | 1.0193 |  0.6907   |   0.0    |          0.0           |
|               moco                |  32  | 0.9788 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|          vision_maskrcnn          |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 26.1245 |  53.9811  | 173.1003 |        173.3623        |
|         phlippe_densenet          | 128  | 3.2164  |  6.8775   | 170.3835 |        162.5231        |
|            hf_BigBird             |  2   | 12.6012 |  35.9683  | 148.276  |        128.9725        |
|         timm_efficientnet         |  32  | 5.0359  |  10.2458  | 140.5327 |        140.1202        |
|            densenet121            |  4   | 7.6426  |  17.7452  | 136.6661 |        136.2243        |
|        mobilenet_v3_large         |  32  | 3.4099  |   8.005   | 136.0634 |        138.3727        |
|           mobilenet_v2            |  96  | 3.2555  |   6.878   | 129.1956 |        130.3093        |
|              yolov3               |  16  | 4.8853  |  10.3432  | 112.4297 |        117.0968        |
|            mnasnet1_0             |  32  |  3.111  |  6.6823   | 108.1394 |        104.5844        |
|           hf_GPT2_large           |  4   | 14.4885 |  29.0416  | 107.4939 |        106.2721        |
|             resnet152             |  32  | 9.0628  |  19.6735  | 106.7256 |        102.4121        |
|           timm_resnest            |  32  | 1.7965  |  3.8708   | 99.1151  |        97.8106         |
|        shufflenet_v2_x1_0         | 128  | 3.4472  |  8.0979   | 82.7103  |        79.5358         |
|        speech_transformer         |  32  | 6.1844  |  13.3804  | 81.1039  |        73.0304         |
| attention_is_all_you_need_pytorch | 256  | 4.3526  |  11.217   | 73.5676  |        74.0699         |
|            timm_regnet            |  32  | 6.5926  |  12.0796  |  72.787  |        68.8765         |
|            timm_nfnet             | 128  | 5.7748  |  11.1198  | 71.8669  |        70.5609         |
|        Background_Matting         |  4   | 3.0297  |  11.1787  |  70.074  |        69.5871         |
|           BERT_pytorch            |  16  | 4.8739  |  11.2628  | 68.4275  |        68.4474         |
|             resnet50              |  32  | 3.1703  |  6.9208   | 65.4792  |        64.7451         |
|           hf_Bert_large           |  4   | 10.116  |  21.0342  |  63.449  |        65.2141         |
|           pytorch_unet            |  1   | 1.6075  |  4.3723   | 60.0125  |        58.6938         |
|            timm_vovnet            |  32  | 3.5783  |  6.8664   | 59.6853  |        60.6804         |
|       functorch_dp_cifar10        |  64  | 1.1906  |   2.376   | 54.9086  |        56.4881         |
|          resnext50_32x4d          |  8   | 3.1923  |  6.8206   | 53.3682  |        52.2457         |
|               hf_T5               |  8   | 5.7899  |  13.0343  | 52.7536  |        51.6756         |
|              hf_Bart              |  4   | 6.1541  |  13.447   | 49.8579  |        49.8337         |
|      timm_vision_transformer      |  32  | 3.2303  |  7.0294   |  49.235  |        48.7842         |
|           fastNLP_Bert            |  6   | 5.3689  |  10.8553  | 48.6733  |        47.4123         |
|          pytorch_stargan          |  16  | 1.1718  |  3.1596   | 45.3654  |        43.7005         |
|             resnet18              |  16  | 1.3291  |  2.7046   | 43.9085  |        42.0335         |
|          LearningToPaint          |  96  | 1.3917  |  2.8442   | 43.5845  |        42.6513         |
|            Super_SloMo            |  6   | 2.6848  |  9.5731   | 43.3348  |        42.3454         |
|             hf_Albert             |  8   | 2.5352  |  8.2475   | 43.1152  |        40.0965         |
|            hf_Reformer            |  4   | 4.0531  |  5.8202   |  42.704  |        38.4386         |
|              hf_GPT2              |  4   | 4.7151  |  9.3354   | 42.6354  |        41.7473         |
|              hf_Bert              |  4   |  5.139  |  10.4528  | 38.7416  |         39.273         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2041  |  2.8718   | 37.7932  |        34.2613         |
|           hf_DistilBert           |  8   | 2.4803  |  5.4835   | 33.2903  |        31.5343         |
|          phlippe_resnet           | 128  | 1.3378  |  2.7798   | 32.8507  |        31.4213         |
|              demucs               |  4   | 1.4058  |  2.1293   | 28.7701  |        27.8432         |
|          pytorch_struct           | 200  |  0.774  |  1.3081   | 21.8636  |        20.0918         |
|           squeezenet1_1           |  32  | 1.0739  |  1.7075   | 21.6433  |        23.4722         |
|               vgg16               |  64  | 0.6223  |  1.1069   | 14.7465  |        14.5314         |
|              alexnet              | 128  | 0.4753  |  0.7593   | 13.9442  |         14.135         |
|      nvidia_deeprecommender       | 256  | 0.4737  |  0.7388   | 11.0924  |         9.3919         |
|                drq                |  1   | 0.6442  |  0.9843   |  9.4236  |        11.7242         |
|         soft_actor_critic         | 256  | 0.4123  |  0.5863   |  9.3746  |         9.2331         |
|               dcgan               |  32  |  0.425  |  0.6996   |  7.9805  |         7.8476         |
|               dlrm                | 1024 | 0.3583  |  0.7666   |  7.8496  |         8.326          |
|           lennard_jones           | 1000 | 0.3852  |  0.5863   |  7.3257  |         5.8636         |
|            tts_angular            |  64  | 0.4507  |  0.5203   |  5.9332  |         5.7538         |
|   timm_vision_transformer_large   |  32  | 9.3292  |    nan    |   nan    |        123.614         |
|           hf_Longformer           |  2   | 9.4724  |  29.8724  |   nan    |          nan           |
|               moco                |  32  | 34.0724 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0378  |         1.2557         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9871 |  0.7653   |   1.01   |         1.1018         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9895  |         0.9983         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|            timm_nfnet             | 128  | 0.907  |  0.8749   |  0.968   |         1.0734         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  0.9575  |         1.1593         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9321  |         1.0713         |
|         timm_efficientnet         |  32  | 0.9847 |  0.8179   |  0.9291  |          0.94          |
|              yolov3               |  16  | 0.9921 |  0.8258   |  0.8919  |         1.036          |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8904  |         1.128          |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.869          |
|           timm_resnest            |  32  | 0.9888 |   0.897   |  0.8635  |         0.9665         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  0.8614  |         1.208          |
|        shufflenet_v2_x1_0         | 128  | 0.9539 |  0.8374   |  0.8613  |         0.9658         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9955 |  0.8496   |  0.8506  |         0.953          |
|             resnet152             |  32  | 0.9948 |  0.8928   |  0.8499  |         0.9405         |
|        Background_Matting         |  4   | 1.0127 |  0.6487   |  0.8485  |         1.0403         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9945         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.8411  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.8302  |         1.0725         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.7933  |         0.9173         |
|        mobilenet_v3_large         |  32  | 0.9793 |  0.8395   |  0.7842  |         0.7757         |
|             resnet50              |  32  | 0.9904 |  0.8621   |  0.7831  |         0.8851         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|              demucs               |  4   | 0.9658 |  0.9657   |  0.773   |         0.9655         |
|           squeezenet1_1           |  32  | 0.9666 |  0.9312   |  0.773   |         0.908          |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|            mnasnet1_0             |  32  | 0.9778 |  0.8649   |  0.7159  |         0.8062         |
|            densenet121            |  4   | 0.9944 |  0.9783   |  0.7096  |         0.7998         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.7091  |         0.939          |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.697   |         0.7362         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6949  |         1.1191         |
|          resnext50_32x4d          |  8   | 0.9942 |  0.8399   |  0.6677  |         0.7699         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8796   |  0.6065  |         0.6172         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.5925  |         0.7463         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.5395  |         0.6097         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 0.9978 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 209.0763 | 211.1086  | 120.9541 |        116.6355        |
|        Background_Matting         |  4   | 126.0655 | 917.1922  | 103.8089 |        104.4095        |
|            hf_T5_large            |  2   | 221.6512 |  265.834  | 93.8624  |        110.0374        |
|               hf_T5               |  8   | 182.3898 | 210.5929  | 92.0696  |        89.0031         |
|            timm_nfnet             | 128  | 118.7039 | 118.4323  | 75.2946  |         78.692         |
|            hf_BigBird             |  2   | 227.9876 |  241.22   | 73.9134  |        115.191         |
|            hf_Reformer            |  4   | 82.2474  |  83.642   | 70.6101  |        75.7284         |
|            Super_SloMo            |  6   | 79.5223  | 447.2153  | 63.3194  |         64.242         |
|              yolov3               |  16  | 68.5574  |  84.8885  | 57.2539  |        56.9848         |
|            timm_regnet            |  32  | 60.6222  |  70.4431  | 55.2055  |        56.9593         |
|               vgg16               |  64  | 66.3459  |  66.2524  | 53.4808  |        52.9063         |
|             resnet152             |  32  | 66.7107  |  80.6042  | 52.5821  |        69.9267         |
|              demucs               |  4   | 53.8324  |  53.5108  | 51.6962  |        51.7976         |
|           hf_Bert_large           |  4   | 80.1176  |  92.8767  |  50.831  |        49.9842         |
|        speech_transformer         |  32  | 64.8485  |  71.9602  | 37.1659  |         38.04          |
| attention_is_all_you_need_pytorch | 256  | 54.2011  |  65.9854  | 35.4803  |        35.5895         |
|              hf_Bart              |  4   | 91.5889  |  90.0955  | 34.1224  |        34.8251         |
|           fastNLP_Bert            |  6   |  56.657  |  59.5867  | 33.4045  |        33.8149         |
|           mobilenet_v2            |  96  | 47.1307  |  60.2485  | 30.7055  |        30.7454         |
|           pytorch_unet            |  1   |  39.975  | 195.2073  | 29.2257  |        29.4041         |
|             hf_Albert             |  8   | 70.1647  |  72.3532  | 29.0937  |        29.6664         |
|            timm_vovnet            |  32  | 28.2122  |  37.6704  | 26.4514  |        25.8908         |
|              hf_GPT2              |  4   | 52.4251  |  49.4373  | 26.1051  |        25.6177         |
|         timm_efficientnet         |  32  | 34.4039  |  51.5491  | 21.9522  |        28.8831         |
|             resnet50              |  32  | 26.8936  |  34.7429  | 21.9128  |        24.0021         |
|              hf_Bert              |  4   | 39.7972  |  46.508   | 21.7661  |        23.6713         |
|           hf_DistilBert           |  8   | 33.1161  |  35.2607  | 21.2305  |        20.8156         |
|            densenet121            |  4   | 52.1297  |  70.9852  | 20.1047  |        48.3217         |
|        shufflenet_v2_x1_0         | 128  | 30.2249  |  43.2208  | 18.6876  |        26.2744         |
|      timm_vision_transformer      |  32  | 28.5596  |  32.5712  | 18.0041  |        19.7055         |
|           BERT_pytorch            |  16  | 52.6311  |  66.2116  | 17.0294  |        24.0565         |
|           timm_resnest            |  32  | 24.2731  |  28.3522  | 15.1941  |        15.7463         |
|        mobilenet_v3_large         |  32  | 28.4272  |  35.914   | 13.5337  |        21.4523         |
|            mnasnet1_0             |  32  | 23.3589  |  31.3474  | 13.0871  |        22.2895         |
|          resnext50_32x4d          |  8   | 22.0113  |  26.8479  | 11.7081  |        19.5645         |
|      nvidia_deeprecommender       | 256  | 10.2349  |  10.2373  | 11.7003  |         10.046         |
|          pytorch_stargan          |  16  | 14.6148  |  18.2068  | 11.5519  |        11.7966         |
|         phlippe_densenet          | 128  | 23.4013  |  29.4832  | 11.0404  |        22.9362         |
|              alexnet              | 128  |  9.8311  |  9.8663   |  9.0253  |          8.63          |
|          LearningToPaint          |  96  | 11.2545  |  14.1794  |  8.5591  |        10.3222         |
|            tts_angular            |  64  |  7.4971  |  7.7946   |  6.6944  |         6.5889         |
|             resnet18              |  16  |  9.724   |  11.5184  |  5.7441  |         9.1183         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.4166  |  15.0591  |  5.6869  |         8.1987         |
|           squeezenet1_1           |  32  | 11.0553  |  11.6343  |  5.3568  |         7.3111         |
|          phlippe_resnet           | 128  |  9.063   |  11.6692  |  4.877   |         9.0068         |
|       functorch_dp_cifar10        |  64  | 10.6098  |  11.0961  |  2.7882  |         7.326          |
|          pytorch_struct           | 200  |  5.9021  |  6.1071   |  2.4126  |         4.1431         |
|               dlrm                | 1024 |  4.919   |  4.9534   |  2.2469  |         3.473          |
|         soft_actor_critic         | 256  |  1.8387  |  2.4115   |  2.1562  |         2.8296         |
|                drq                |  1   |  3.4386  |  4.2332   |  2.1341  |         4.3139         |
|               dcgan               |  32  |  2.4259  |  3.0979   |  1.6281  |         2.5412         |
|           lennard_jones           | 1000 |  1.8396  |  2.0676   |  1.1199  |         1.7541         |
|   timm_vision_transformer_large   |  32  | 464.0141 |    nan    |   nan    |        427.8519        |
|           hf_Longformer           |  2   | 110.8596 | 162.0472  |   nan    |          nan           |
|               moco                |  32  | 52.4198  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 1.0141 |  0.8531   |  3.0472  |         1.1932         |
|             OPTForCausalLM              |  2  | 0.9894 |   0.91    |  2.473   |         2.5198         |
|     MobileBertForQuestionAnswering      | 128 | 1.0148 |  0.8469   |  2.4211  |         1.1756         |
|      GPT2ForSequenceClassification      |  4  | 0.9888 |  0.9638   |  2.3281  |         2.3552         |
|       MT5ForConditionalGeneration       | 16  | 1.0144 |  0.8461   |  2.2329  |         1.9705         |
|       ElectraForQuestionAnswering       | 64  | 0.998  |  0.9871   |  2.1825  |         2.1662         |
|             XGLMForCausalLM             |  8  | 0.9936 |  0.8449   |  1.9724  |         1.5221         |
|     M2M100ForConditionalGeneration      | 16  | 0.996  |  0.8272   |  1.9374  |         1.5903         |
|           ElectraForCausalLM            | 32  | 0.9971 |  0.9499   |  1.8473  |         1.8703         |
|    LayoutLMForSequenceClassification    | 16  | 0.997  |  0.9834   |  1.847   |         1.8202         |
|            XLNetLMHeadModel             |  8  | 0.999  |  0.9705   |  1.8289  |         1.8189         |
|        BertForQuestionAnswering         | 16  | 0.9977 |   0.983   |  1.8019  |         1.8042         |
|       RobertaForQuestionAnswering       | 16  | 0.9969 |  0.9824   |  1.8016  |         1.8084         |
|       T5ForConditionalGeneration        |  4  | 0.9935 |   0.861   |  1.6779  |         1.7733         |
|               DistillGPT2               | 16  | 0.9928 |  0.9605   |  1.6751  |         1.719          |
|           RobertaForCausalLM            | 16  | 0.9977 |  0.9728   |  1.6731  |         1.6989         |
|                 T5Small                 |  4  | 0.9935 |  0.8556   |  1.6669  |         1.7747         |
|         MegatronBertForCausalLM         |  4  | 1.0209 |  0.9323   |  1.6627  |         1.5722         |
|       AlbertForQuestionAnswering        |  4  | 1.0003 |   0.886   |  1.6536  |         1.6544         |
|            PLBartForCausalLM            |  8  | 0.9876 |  0.9611   |  1.6534  |         1.6869         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9977 |  0.9772   |  1.6526  |         1.6783         |
|     PLBartForConditionalGeneration      |  4  | 0.9923 |  0.9506   |  1.6487  |         1.6561         |
|      MBartForConditionalGeneration      |  2  | 1.0312 |  0.9659   |  1.6454  |         1.4885         |
|            AlbertForMaskedLM            |  4  | 1.0002 |  0.8852   |  1.6418  |         1.6377         |
|           LayoutLMForMaskedLM           | 16  | 0.9969 |  0.9733   |  1.6074  |         1.6255         |
|             BertForMaskedLM             | 16  | 0.9972 |  0.9722   |  1.6029  |         1.6151         |
|                CamemBert                | 16  | 0.9977 |   0.974   |  1.5481  |         1.5617         |
|         Speech2Text2ForCausalLM         | 256 | 0.9795 |  0.9215   |  1.5415  |         1.5733         |
|             BartForCausalLM             |  4  | 0.9897 |  0.9609   |  1.5282  |         1.5604         |
|            MBartForCausalLM             |  4  | 0.9885 |  0.9501   |  1.5241  |         1.5546         |
|            YituTechConvBert             | 16  | 0.9983 |  0.9696   |  1.5229  |         1.5196         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0089 |  0.9231   |  1.4974  |         1.4835         |
|      BartForConditionalGeneration       |  2  | 1.0026 |  0.9763   |  1.4783  |         1.5164         |
|     DistilBertForQuestionAnswering      | 256 | 0.9967 |   0.991   |  1.4619  |         1.4621         |
|     PegasusForConditionalGeneration     | 32  | 1.0082 |   0.938   |  1.3606  |         1.3012         |
|           PegasusForCausalLM            | 32  | 0.9887 |  0.9191   |  1.2686  |         1.2205         |
|            TrOCRForCausalLM             | 32  | 0.9886 |  0.9556   |  1.2672  |         1.2973         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9892 |  0.9177   |  1.2461  |         1.3324         |
|          DistilBertForMaskedLM          | 128 | 0.9962 |  0.9544   |  1.2218  |         1.2411         |
|       DebertaForQuestionAnswering       |  8  | 0.8313 |  0.7231   |  1.1904  |         1.0841         |
|           DebertaForMaskedLM            |  4  | 0.7519 |  0.5867   |  1.0831  |         0.9726         |
|          DebertaV2ForMaskedLM           |  1  | 0.7262 |   0.542   |  1.0163  |         0.7427         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7418 |  0.5496   |  0.9877  |          0.77          |
|          BlenderbotForCausalLM          |  4  | 0.9894 |  0.8389   |   0.0    |         1.4055         |
|          AllenaiLongformerBase          |  4  | 1.0097 |  0.6702   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|      DebertaV2ForQuestionAnswering      |  2  | 15.7584 |  26.3999  | 733.502  |        72.3895         |
|          MobileBertForMaskedLM          | 64  | 16.8697 |  39.6594  | 145.9296 |        148.2348        |
|          DebertaV2ForMaskedLM           |  1  | 15.4789 |  26.916   | 140.7713 |        70.7588         |
|     MobileBertForQuestionAnswering      | 128 | 16.8866 |  39.5807  | 140.1224 |        141.5091        |
|     M2M100ForConditionalGeneration      | 16  | 11.9118 |  25.9997  | 134.7268 |        135.0586        |
|       MT5ForConditionalGeneration       | 16  | 7.7113  |  18.1096  | 132.282  |        134.0228        |
|             XGLMForCausalLM             |  8  | 9.3261  |  20.4702  | 131.1217 |        132.5058        |
|            XLNetLMHeadModel             |  8  | 10.5122 |  27.2326  |  93.39   |        93.0203         |
|       DebertaForQuestionAnswering       |  8  | 7.4687  |  13.0689  |  92.43   |        60.2406         |
|           DebertaForMaskedLM            |  4  | 7.3016  |  13.289   | 82.4738  |        52.4777         |
|      MBartForConditionalGeneration      |  2  | 11.8073 |  25.635   | 80.6467  |        79.9223         |
|      BartForConditionalGeneration       |  2  | 12.0099 |  25.7015  | 76.5724  |        75.7217         |
|     PegasusForConditionalGeneration     | 32  | 5.2292  |  18.9849  |  68.556  |        67.7581         |
|         MegatronBertForCausalLM         |  4  | 10.1687 |  21.0495  | 67.3733  |        68.3756         |
|            YituTechConvBert             | 16  | 7.5168  |  15.7438  | 66.9853  |        67.1768         |
|    MegatronBertForQuestionAnswering     |  8  | 10.235  |  21.0203  | 66.9319  |        68.4783         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7649  |  16.7078  | 55.0815  |        53.5161         |
|           ElectraForCausalLM            | 32  | 5.4836  |  10.5992  | 54.9015  |         54.156         |
|                 T5Small                 |  4  | 5.6693  |  12.3966  | 50.6781  |        49.7085         |
|       T5ForConditionalGeneration        |  4  | 5.6629  |  12.7421  | 50.5625  |         50.14          |
|     PLBartForConditionalGeneration      |  4  | 6.1327  |  13.2235  | 48.8266  |         48.972         |
|       ElectraForQuestionAnswering       | 64  | 5.4323  |  10.5311  | 47.3593  |        47.5116         |
|    LayoutLMForSequenceClassification    | 16  |  5.738  |  11.0538  | 46.5252  |        47.5747         |
|        BertForQuestionAnswering         | 16  | 5.2058  |  10.4381  | 40.4133  |        39.5834         |
|           LayoutLMForMaskedLM           | 16  |  5.772  |  10.9556  | 40.4046  |        42.3943         |
|             BartForCausalLM             |  4  |  5.531  |  10.757   | 40.0786  |         39.256         |
|            MBartForCausalLM             |  4  | 5.7581  |  11.2978  | 39.6862  |        41.4585         |
|             BertForMaskedLM             | 16  | 5.1323  |  10.5302  | 39.6812  |        40.7802         |
|            AlbertForMaskedLM            |  4  |  2.196  |  7.9102   | 39.6562  |        38.5025         |
|                CamemBert                | 16  | 5.4127  |  10.5806  | 38.9479  |        37.6209         |
|     DistilBertForQuestionAnswering      | 256 | 2.5979  |   5.282   | 38.4529  |        38.0878         |
|           PegasusForCausalLM            | 32  | 5.6286  |  11.0021  | 37.8335  |        38.0417         |
|           RobertaForCausalLM            | 16  | 5.3818  |  10.6814  | 37.3849  |        37.8009         |
|             OPTForCausalLM              |  2  | 4.7481  |  10.0844  | 36.8463  |        39.4695         |
|          DistilBertForMaskedLM          | 128 | 2.6183  |  5.3035   | 36.6849  |        35.4621         |
|            TrOCRForCausalLM             | 32  | 5.6197  |  10.9078  | 36.3844  |        36.2469         |
|       RobertaForQuestionAnswering       | 16  | 5.3656  |  10.5832  | 36.3401  |        35.0448         |
|       AlbertForQuestionAnswering        |  4  |  2.168  |  7.8008   | 35.8482  |        35.1728         |
|      GPT2ForSequenceClassification      |  4  | 4.7556  |  9.5655   | 35.3295  |        34.3797         |
|       BlenderbotSmallForCausalLM        | 64  | 3.7532  |   7.338   | 30.1407  |        30.1163         |
|               DistillGPT2               | 16  | 2.4777  |  4.9669   | 27.5752  |        27.1035         |
|            PLBartForCausalLM            |  8  |  3.232  |  5.9219   | 25.1249  |        26.5325         |
|         Speech2Text2ForCausalLM         | 256 | 3.0399  |  5.7214   | 24.8333  |         24.353         |
|          BlenderbotForCausalLM          |  4  | 11.162  |  21.3311  |   nan    |        69.1817         |
|          AllenaiLongformerBase          |  4  | 9.5503  |  30.1179  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1376  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.094   |         1.1346         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0607  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0603  |         1.1724         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0077  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0075  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0035  |         1.0491         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  0.9911  |         1.0411         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9729  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9649  |         1.052          |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9501  |         1.268          |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9281  |         0.9912         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9138  |         0.9886         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9137  |         0.9749         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0018         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.8941  |         0.9739         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.893   |         0.9864         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8836  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.8689  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8184  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.789   |         0.8779         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7651  |         0.9908         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7117  |         0.9792         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9143   |  0.5646  |         0.9988         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5187  |         0.9664         |
|       DebertaForQuestionAnswering       |  8  | 0.9506 |  1.0516   |  0.4867  |         1.1525         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9764   |  0.4855  |         0.9798         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8694   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.0923 | 300.6027  | 161.9874 |        162.7453        |
|       AlbertForQuestionAnswering        |  4  | 264.0185 | 298.1106  | 159.651  |        159.6063        |
|            XLNetLMHeadModel             |  8  | 280.2409 | 287.5018  | 152.3359 |        153.1903        |
|      DebertaV2ForQuestionAnswering      |  2  | 158.7089 | 190.9119  | 121.0313 |        152.656         |
|     PegasusForConditionalGeneration     | 32  | 154.677  | 149.6302  | 111.0737 |        107.3094        |
|            TrOCRForCausalLM             | 32  | 139.4861 | 143.9292  | 108.8108 |        106.3146        |
|          DebertaV2ForMaskedLM           |  1  | 143.8359 | 190.3051  | 104.0159 |        140.1943        |
|      MBartForConditionalGeneration      |  2  | 142.4255 | 143.8836  | 95.4166  |        92.4724         |
|      BartForConditionalGeneration       |  2  | 137.9594 | 140.7691  | 93.3472  |         92.397         |
|    MegatronBertForQuestionAnswering     |  8  | 142.2117 |  144.862  | 85.9108  |         84.763         |
|            YituTechConvBert             | 16  | 126.2536 |  129.295  | 82.4556  |        82.4649         |
| BlenderbotSmallForConditionalGeneration | 64  | 115.0695 | 120.1966  | 80.4001  |        78.5344         |
|                CamemBert                | 16  | 119.065  | 121.5096  | 76.4959  |        75.7363         |
|     M2M100ForConditionalGeneration      | 16  | 151.4709 | 141.4062  | 75.1629  |        99.8207         |
|            MBartForCausalLM             |  4  | 114.9043 |  120.228  | 74.9146  |        72.9856         |
|             BartForCausalLM             |  4  | 115.3897 | 117.8809  | 74.1619  |         72.627         |
|     PLBartForConditionalGeneration      |  4  | 118.8699 | 122.5811  | 71.8153  |        70.5685         |
|     DistilBertForQuestionAnswering      | 256 | 103.4756 | 104.1797  | 70.9192  |        70.9627         |
|           LayoutLMForMaskedLM           | 16  | 113.0034 | 115.5684  |  70.145  |        69.3418         |
|            PLBartForCausalLM            |  8  | 118.5863 | 120.0647  | 69.8373  |        68.3885         |
|          DistilBertForMaskedLM          | 128 | 84.9062  |   88.65   | 69.7116  |        68.1461         |
|     MobileBertForQuestionAnswering      | 128 | 163.6966 | 201.2901  | 69.0852  |        143.1801        |
|           RobertaForCausalLM            | 16  | 115.6125 | 118.1932  | 68.7937  |        67.7378         |
|             BertForMaskedLM             | 16  | 110.277  | 112.9745  | 68.7931  |        68.0582         |
|             OPTForCausalLM              |  2  | 172.316  | 182.1494  | 68.7451  |        68.0453         |
|       DebertaForQuestionAnswering       |  8  | 91.3561  | 104.8621  | 63.7321  |        70.0413         |
|               DistillGPT2               | 16  | 106.5676 | 110.0182  | 63.1725  |        61.5521         |
|       T5ForConditionalGeneration        |  4  | 107.3747 | 121.5038  | 62.6607  |        59.0177         |
|                 T5Small                 |  4  | 105.4233 | 121.9406  | 62.5973  |        58.8877         |
|          MobileBertForMaskedLM          | 64  | 166.5683 | 203.9903  | 61.5642  |        148.836         |
|           PegasusForCausalLM            | 32  | 75.5736  |  78.0191  | 58.5348  |        56.8132         |
|           DebertaForMaskedLM            |  4  | 81.9181  | 107.0964  | 57.9203  |        71.0688         |
|         MegatronBertForCausalLM         |  4  | 86.2172  |  92.7087  | 56.8565  |        55.5696         |
|             XGLMForCausalLM             |  8  | 118.1875 |  114.793  |  53.841  |        77.6833         |
|       RobertaForQuestionAnswering       | 16  |  96.325  |  97.2229  | 53.1855  |        52.8764         |
|    LayoutLMForSequenceClassification    | 16  | 98.1324  |  99.3103  | 53.0443  |        53.8351         |
|        BertForQuestionAnswering         | 16  | 95.6125  |  96.7409  | 52.7558  |        52.6875         |
|       ElectraForQuestionAnswering       | 64  | 115.1705 | 116.0136  | 52.6636  |        53.7124         |
|           ElectraForCausalLM            | 32  | 88.7884  |  92.4321  | 47.7816  |        47.2167         |
|       BlenderbotSmallForCausalLM        | 64  | 62.3169  |  63.1138  | 46.6068  |        49.3567         |
|       MT5ForConditionalGeneration       | 16  | 90.9933  | 110.2333  |  41.645  |        47.2747         |
|      GPT2ForSequenceClassification      |  4  | 92.6677  |  94.759   | 39.4242  |        38.8624         |
|         Speech2Text2ForCausalLM         | 256 | 54.8953  |  57.247   | 34.9015  |        34.1446         |
|          BlenderbotForCausalLM          |  4  | 119.6027 | 116.8901  |   nan    |        81.8951         |
|          AllenaiLongformerBase          |  4  | 180.6768 | 270.0752  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 |  1.0   |  0.9985   |  3.0245  |         2.9849         |
|      xcit_large_24_p8_224       |  5  | 0.9973 |  0.8632   |  2.1005  |         1.6412         |
|        twins_pcpvt_base         | 64  | 1.0018 |  0.9032   |  1.9932  |         1.7029         |
|         coat_lite_mini          | 128 | 0.9995 |  0.9985   |  1.9545  |         1.9303         |
|          ghostnet_100           | 128 | 0.9983 |   0.766   |  1.8737  |         1.6273         |
|          gmlp_s16_224           | 128 | 0.9997 |  1.0895   |  1.8708  |         1.849          |
|          gmixer_24_224          | 128 | 0.9998 |  0.8932   |  1.782   |         1.7636         |
|           volo_d1_224           | 64  | 0.9995 |  0.9783   |  1.7074  |         1.6823         |
|         crossvit_9_240          | 128 |  1.0   |  0.7891   |  1.6704  |         1.6424         |
|            lcnet_050            | 128 | 0.9462 |  0.7384   |  1.6515  |         1.4349         |
|  swin_base_patch4_window7_224   | 64  | 0.9994 |  0.9633   |  1.638   |         1.6297         |
|           convit_base           | 64  | 0.9998 |  0.9994   |  1.6163  |         1.6159         |
|       gluon_inception_v3        | 128 | 0.9997 |  0.8669   |  1.5399  |         1.5311         |
|          inception_v3           | 128 | 0.9996 |  0.8668   |  1.5396  |         1.5277         |
|        adv_inception_v3         | 128 | 0.9997 |   0.863   |  1.5395  |         1.5288         |
|             dla102              | 128 | 0.9994 |  0.8176   |  1.5341  |         1.5317         |
|          convnext_base          | 64  | 0.9996 |  1.0011   |  1.5288  |         1.5089         |
|        sebotnet33ts_256         | 64  | 0.9655 |  0.7695   |  1.5273  |         1.5566         |
|            nfnet_l0             | 128 |  1.0   |  0.8221   |  1.5062  |         1.4567         |
|           dm_nfnet_f0           | 128 | 0.9993 |  0.9984   |  1.5037  |         1.4558         |
|           regnety_002           | 128 | 0.9595 |  0.7194   |  1.4734  |         1.2748         |
|       eca_botnext26ts_256       | 128 | 0.9773 |  0.7223   |  1.4561  |         1.4346         |
|           mobilevit_s           | 64  | 0.9716 |  0.7363   |  1.4464  |         1.4638         |
|            pit_b_224            | 64  | 0.9995 |  0.9975   |  1.4451  |         1.4389         |
|           resnest101e           | 64  | 0.9996 |  0.8702   |  1.439   |         1.3638         |
|      mobilenetv3_large_100      | 128 | 0.9514 |  0.7624   |  1.4355  |         1.4468         |
|           mnasnet_100           | 128 | 0.9502 |  0.7419   |  1.4316  |         1.5002         |
|          botnet26t_256          | 128 | 0.9758 |  0.8544   |  1.4167  |         1.431          |
|           selecsls42b           | 128 | 0.9993 |  0.8128   |  1.4146  |         1.4134         |
|          jx_nest_base           | 32  | 0.9991 |   0.998   |  1.3914  |         1.3817         |
|         mobilenetv2_100         | 128 | 0.9509 |  0.7385   |  1.3873  |         1.449          |
|        res2net50_14w_8s         | 128 | 0.9996 |  0.7908   |  1.3835  |         1.3176         |
|           res2next50            | 128 | 0.9996 |  0.8266   |  1.3737  |         1.365          |
|        ese_vovnet19b_dw         | 128 | 0.9656 |  0.8385   |  1.3679  |         1.3876         |
|          mixer_b16_224          | 128 | 0.9993 |  1.0209   |  1.3668  |         1.3663         |
|          spnasnet_100           | 128 | 0.9444 |  0.7406   |  1.3648  |         1.426          |
|          cait_m36_384           |  4  |  1.0   |  0.9397   |  1.3636  |         1.3576         |
|       tf_efficientnet_b0        | 128 | 0.9648 |  0.6826   |   1.36   |         1.3914         |
|      beit_base_patch16_224      | 64  | 0.9992 |  0.9681   |  1.3574  |         1.3567         |
|           fbnetc_100            | 128 | 0.9518 |  0.7405   |  1.3535  |         1.4047         |
|         poolformer_m36          | 64  | 1.0001 |  0.9965   |  1.352   |         1.3426         |
|            hrnet_w18            | 128 | 0.9983 |  0.6379   |  1.3204  |         1.3557         |
|            fbnetv3_b            | 128 | 0.9532 |  0.7709   |  1.319   |         1.3297         |
|           rexnet_100            | 128 | 0.9609 |  0.7078   |  1.314   |         1.3516         |
|          resmlp_12_224          | 128 | 0.9998 |  0.8953   |  1.2751  |         1.2682         |
| deit_base_distilled_patch16_224 | 64  | 0.9996 |  0.9977   |  1.2605  |         1.2611         |
|          cspdarknet53           | 64  | 0.9435 |  0.7936   |  1.2422  |         1.2788         |
|      vit_base_patch16_224       | 64  | 0.9993 |  0.9971   |  1.2407  |         1.2409         |
|            tinynet_a            | 128 | 0.9513 |   0.68    |  1.2328  |          1.27          |
|           tf_mixnet_l           | 128 | 0.981  |  0.8301   |  1.1922  |         1.1989         |
|        res2net101_26w_4s        | 64  | 1.0004 |   0.788   |  1.1864  |         1.0811         |
|         visformer_small         | 128 | 0.9988 |  0.9483   |  1.1775  |         1.1692         |
|            mixnet_l             | 128 | 0.9805 |  0.8242   |  1.1725  |         1.1878         |
|          pnasnet5large          | 16  | 0.9972 |  0.9286   |  1.1294  |         1.1442         |
|             dpn107              | 32  | 0.9403 |  0.8136   |  1.1036  |         1.1483         |
|            repvgg_a2            | 128 | 0.944  |  0.7604   |  1.0938  |         1.1312         |
|        gluon_xception65         | 32  | 0.9997 |  0.8478   |  1.0849  |         1.089          |
|     swsl_resnext101_32x16d      | 32  | 0.9996 |  0.8442   |  1.0633  |         1.0213         |
|            gernet_l             | 128 | 0.944  |  0.7989   |  1.0497  |         1.0797         |
|        convmixer_768_32         | 32  | 0.9993 |  0.9655   |  1.0033  |         1.0031         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.4597  |  10.9452  | 281.5342 |        284.5444        |
|            hrnet_w18            | 128 |  9.755  |  35.845   | 254.0037 |        246.356         |
|          ghostnet_100           | 128 | 7.5426  |  14.792   | 237.587  |        235.0013        |
|            fbnetv3_b            | 128 | 8.1206  |  16.5358  | 170.3873 |        170.2323        |
|          pnasnet5large          | 16  | 8.0792  |  25.8063  | 165.0816 |        163.9132        |
|           resnest101e           | 64  | 11.4407 |  24.209   | 164.3654 |        165.2112        |
|           mobilevit_s           | 64  | 5.0951  |  11.678   | 160.6709 |        157.6558        |
|          inception_v3           | 128 | 5.5263  |  12.3267  | 160.4699 |        161.5501        |
|        adv_inception_v3         | 128 | 5.6393  |  12.2494  | 157.7377 |        162.0153        |
|            tinynet_a            | 128 | 5.7987  |  11.9533  | 157.5875 |        157.8782        |
|      mobilenetv3_large_100      | 128 | 4.0919  |  8.2393   | 157.1059 |        157.4246        |
|       gluon_inception_v3        | 128 | 5.5944  |  12.4184  | 155.9479 |        158.796         |
|           tf_mixnet_l           | 128 |  8.753  |  17.4174  | 152.3374 |        156.1956        |
|        res2net101_26w_4s        | 64  | 10.6507 |  24.8559  | 152.0736 |        145.8825        |
|       tf_efficientnet_b0        | 128 | 4.9602  |  10.2082  | 150.6486 |        146.9729        |
|            mixnet_l             | 128 | 8.1408  |  16.0019  | 149.9561 |        155.4344        |
|        twins_pcpvt_base         | 64  | 10.4011 |  23.1907  | 149.3229 |        144.3253        |
|          spnasnet_100           | 128 | 4.8387  |  9.2926   | 142.9941 |        137.0462        |
|           fbnetc_100            | 128 | 5.1486  |  9.2973   | 140.8511 |        136.2885        |
|      xcit_large_24_p8_224       |  5  | 13.0336 |  29.0923  | 133.6231 |        130.8808        |
|         mobilenetv2_100         | 128 | 3.8704  |  7.6656   | 129.0996 |        125.8879        |
|        res2net50_14w_8s         | 128 |  8.813  |  22.2137  | 123.8116 |        122.4182        |
|           mnasnet_100           | 128 | 4.0933  |  7.4896   | 123.3497 |        123.6937        |
|          cait_m36_384           |  4  | 13.5941 |  31.7781  | 113.9649 |        117.2888        |
|  swin_base_patch4_window7_224   | 64  | 8.6487  |  18.8849  |  107.27  |        106.4154        |
|           regnety_002           | 128 | 4.6231  |  8.7246   | 106.3658 |        106.7747        |
|        sebotnet33ts_256         | 64  | 4.1812  |  9.1316   | 105.7158 |        107.0892        |
|         poolformer_m36          | 64  |  7.358  |  13.4865  | 102.5903 |         99.495         |
|       eca_botnext26ts_256       | 128 | 3.1439  |  6.6188   | 100.9783 |         96.269         |
|          cspdarknet53           | 64  | 5.6223  |  10.6499  | 100.8131 |        100.5207        |
|             dla102              | 128 | 6.3157  |  13.9761  | 99.0298  |        97.2721         |
|             dpn107              | 32  | 9.5296  |  19.0171  | 98.7883  |        97.4423         |
|            lcnet_050            | 128 | 2.4597  |   4.854   | 96.2894  |        95.5014         |
|        gluon_xception65         | 32  | 7.6997  |  16.7423  | 94.0184  |        94.7583         |
|           selecsls42b           | 128 | 2.4403  |  5.3602   | 91.0379  |        90.4291         |
|           res2next50            | 128 |   4.9   |  11.8303  | 88.7609  |        88.3755         |
|         coat_lite_mini          | 128 | 3.1219  |  7.6749   | 86.9494  |        85.9823         |
|          botnet26t_256          | 128 | 2.9508  |  5.7618   | 86.3537  |        88.0016         |
|         crossvit_9_240          | 128 | 5.6082  |  12.9775  |  86.073  |        86.6207         |
|          jx_nest_base           | 32  | 6.6046  |  14.3428  |  82.233  |         83.337         |
|            gernet_l             | 128 | 4.8258  |  8.8298   | 80.4133  |         81.306         |
|            nfnet_l0             | 128 | 5.0953  |  10.6164  | 79.0892  |        77.5563         |
|        ese_vovnet19b_dw         | 128 | 2.4773  |  4.4712   | 75.0318  |        76.7961         |
|           volo_d1_224           | 64  | 5.1962  |  12.1927  | 73.7992  |        73.5945         |
|           dm_nfnet_f0           | 128 |  5.794  |  11.1119  | 69.8755  |        72.2167         |
|        tnt_s_patch16_224        | 128 | 6.2703  |  16.4451  | 68.7472  |        68.4483         |
|         visformer_small         | 128 | 2.6637  |  5.8861   | 65.0894  |        65.6392         |
|            repvgg_a2            | 128 | 4.6394  |  8.6773   |  62.649  |        58.6434         |
|          convnext_base          | 64  | 6.6309  |  12.291   |  60.617  |        57.4555         |
|     swsl_resnext101_32x16d      | 32  | 6.1308  |  13.506   |  60.387  |        60.3305         |
|          gmlp_s16_224           | 128 | 5.4948  |  11.8465  | 58.4662  |         60.655         |
|          gmixer_24_224          | 128 | 5.5371  |  12.5732  | 50.8039  |        51.0445         |
|           convit_base           | 64  | 3.5639  |  8.3212   | 46.9969  |        46.2814         |
|            pit_b_224            | 64  | 3.3328  |  8.2989   | 46.2696  |        45.6473         |
| deit_base_distilled_patch16_224 | 64  | 3.1701  |  7.3637   | 42.3357  |        40.7449         |
|          resmlp_12_224          | 128 |  2.785  |  5.2636   |  41.161  |        39.2905         |
|      vit_base_patch16_224       | 64  |  3.157  |  6.8498   | 39.0507  |        38.8188         |
|        convmixer_768_32         | 32  | 1.6506  |   7.265   | 35.8703  |        38.0413         |
|      beit_base_patch16_224      | 64  | 3.7794  |  8.9164   | 34.8776  |        37.2185         |
|          mixer_b16_224          | 128 | 2.6506  |  6.1147   | 32.4396  |        32.0585         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1848  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1117  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0079  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.996  |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9635 |  0.9151   |  0.9536  |         1.0325         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9501  |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8225  |         0.9732         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.1222 |  310.823  | 299.1582 |        299.9207        |
|            hrnet_w18            | 128 | 279.3926 | 437.0384  | 211.8834 |        206.3731        |
|          pnasnet5large          | 16  | 196.5894 | 210.6025  | 174.1133 |        171.5077        |
|           tf_mixnet_l           | 128 | 192.9626 | 228.2585  | 158.7615 |        158.2537        |
|            mixnet_l             | 128 | 184.766  | 219.5653  | 154.4821 |        152.5395        |
|          cait_m36_384           |  4  | 167.2225 | 179.1879  | 122.8218 |        123.2498        |
|           resnest101e           | 64  | 164.347  |  188.849  | 114.131  |        120.6209        |
|             dla102              | 128 | 172.072  | 210.3424  | 112.185  |        112.3147        |
|     swsl_resnext101_32x16d      | 32  | 118.7683 | 140.2309  | 111.814  |        116.288         |
|         poolformer_m36          | 64  | 144.8446 | 145.2356  | 107.168  |        107.8495        |
|        tnt_s_patch16_224        | 128 | 323.4572 | 323.5606  | 106.8641 |        108.3467        |
|       gluon_inception_v3        | 128 | 160.5433 | 185.2208  | 104.0863 |        104.6067        |
|        adv_inception_v3         | 128 | 160.1348 | 185.5197  | 104.0699 |        104.9135        |
|          inception_v3           | 128 | 160.3295 | 185.0163  | 104.0044 |        104.9002        |
|        res2net50_14w_8s         | 128 | 140.6793 | 177.7087  | 102.0174 |        107.0291        |
|           convit_base           | 64  | 163.041  |  163.057  | 100.827  |        100.8703        |
|             dpn107              | 32  | 113.1013 | 130.5166  | 96.2443  |        92.3594         |
|           res2next50            | 128 | 126.1611 | 152.2486  | 91.7349  |        92.3229         |
|        gluon_xception65         | 32  |  99.088  | 116.6616  | 91.3735  |        90.8817         |
|  swin_base_patch4_window7_224   | 64  | 146.4931 | 151.4422  | 89.2042  |        89.5425         |
|        res2net101_26w_4s        | 64  | 98.6954  | 125.7458  | 85.7448  |        92.1727         |
|          mixer_b16_224          | 128 | 116.2674 |  113.787  |  85.713  |        85.1375         |
|           dm_nfnet_f0           | 128 | 127.1246 | 126.7627  | 84.1517  |        87.0091         |
|            fbnetv3_b            | 128 | 115.0098 | 141.8848  | 83.0713  |        82.4226         |
|            pit_b_224            | 64  | 118.383  | 118.6156  | 81.6885  |        82.1061         |
|          convnext_base          | 64  | 122.7013 | 122.3377  | 80.1289  |        81.1067         |
|         visformer_small         | 128 | 91.0155  |  95.8368  | 77.1777  |        77.7505         |
|      beit_base_patch16_224      | 64  | 101.2689 | 104.4987  | 74.7033  |        74.5965         |
|            nfnet_l0             | 128 | 112.0839 | 136.0494  | 74.1224  |        76.8812         |
|          gmlp_s16_224           | 128 | 136.9685 | 125.8101  | 73.3754  |        74.1946         |
|       eca_botnext26ts_256       | 128 | 108.3588 |  146.766  | 72.8692  |        73.8322         |
|          jx_nest_base           | 32  | 100.3112 | 100.4606  | 71.9672  |        72.4245         |
|          cspdarknet53           | 64  | 93.9416  | 111.7394  | 71.3832  |        69.3465         |
|           volo_d1_224           | 64  | 120.5618 | 122.9275  |  70.628  |         71.536         |
|          botnet26t_256          | 128 | 101.6508 | 116.1019  | 70.1157  |        69.3491         |
|      vit_base_patch16_224       | 64  | 86.8333  |  86.8384  | 69.8301  |        69.7152         |
|            gernet_l             | 128 |  77.115  |  91.0817  |  69.383  |        67.4516         |
| deit_base_distilled_patch16_224 | 64  | 84.6248  |  84.7853  | 67.0643  |        67.0538         |
|            repvgg_a2            | 128 | 77.0675  |  95.5497  |  66.453  |         64.277         |
|          gmixer_24_224          | 128 | 117.461  | 131.5865  | 66.1129  |         66.593         |
|      xcit_large_24_p8_224       |  5  | 135.6531 | 143.0108  | 60.8652  |        86.5254         |
|       tf_efficientnet_b0        | 128 | 84.4996  | 119.3259  | 59.9708  |        58.5454         |
|        twins_pcpvt_base         | 64  | 115.0231 | 140.4247  | 59.1369  |        74.6125         |
|           fbnetc_100            | 128 | 82.8964  | 106.2733  | 58.1629  |         56.092         |
|           rexnet_100            | 128 | 79.3845  | 107.7726  |  57.923  |        56.3101         |
|         coat_lite_mini          | 128 | 112.7557 | 112.8527  | 57.7808  |        58.3771         |
|            tinynet_a            | 128 | 73.2738  | 102.3793  | 56.6344  |        54.8181         |
|           mobilevit_s           | 64  | 83.9069  | 110.5273  |  56.218  |        55.6812         |
|        sebotnet33ts_256         | 64  | 79.8288  | 100.0899  | 50.4605  |        49.4531         |
|         crossvit_9_240          | 128 | 81.8392  | 103.4668  |  48.935  |        49.7517         |
|          spnasnet_100           | 128 |  70.194  |  89.5011  | 48.5554  |        46.5305         |
|          ghostnet_100           | 128 | 90.0296  | 117.4111  | 47.9566  |         55.297         |
|        ese_vovnet19b_dw         | 128 | 64.1175  |  73.922   | 45.3281  |        44.6216         |
|         mobilenetv2_100         | 128 | 65.4116  |  84.206   | 44.8808  |        42.9293         |
|           mnasnet_100           | 128 | 64.2265  |  82.2894  | 42.5878  |        40.6332         |
|           selecsls42b           | 128 | 60.0321  |  73.7044  | 42.4233  |        42.4582         |
|          resmlp_12_224          | 128 | 53.2451  |  59.3178  | 41.6828  |        41.8941         |
|      mobilenetv3_large_100      | 128 | 61.1833  |  76.3156  | 40.5957  |        40.2272         |
|           regnety_002           | 128 |  37.774  |  54.5313  | 26.0584  |        28.7802         |
|            lcnet_050            | 128 | 31.4915  |  40.4563  | 18.0601  |        20.7881         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/torchbench_amp.png :

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_100_10_04_23_performance_amp_531

Commit hashes

pytorch commit: ab385bd
pytorch commit date: 2023-04-11 02:20:26+00:00
torchbench commit: 137c3f0e68280ab41c94403464058621a7c7fae1
torchbench commit date: 2023-04-08 04:29:31-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitab385bd

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.61x    |    1.65x    |    1.42x    |
| inductor_no_cudagraphs |   1.30x    |    1.54x    |    1.40x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.79    |    7.16     |    5.83     |
|       aot_eager        |    9.19    |    15.28    |    13.00    |
|        inductor        |   62.85    |    62.41    |   109.56    |
| inductor_no_cudagraphs |   62.97    |    59.71    |   109.26    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.78x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Previous report name: /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 85%, 51/60  | 85%, 51/60  |
|        inductor        | huggingface | 91%, 41/45  | 91%, 41/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.62x    |   1.61x   |
|        inductor        | huggingface |   1.65x    |   1.65x   |
|        inductor        | timm_models |   1.42x    |   1.42x   |
| inductor_no_cudagraphs | torchbench  |   1.30x    |   1.30x   |
| inductor_no_cudagraphs | huggingface |   1.54x    |   1.54x   |
| inductor_no_cudagraphs | timm_models |   1.40x    |   1.40x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |         hf_Longformer         |   fail_to_run   |      fail_to_run       |
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |              gat              |     0.0000      |         0.0000         |
| torchbench  |              gcn              |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |             sage              |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |          pass          |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |              drq              |  1.5399  |         0.9372         |
| torchbench  |         lennard_jones         |  1.4268  |         0.9064         |
| torchbench  |             dcgan             |  1.3971  |         0.8366         |
| torchbench  |       soft_actor_critic       |  1.1611  |         0.7504         |
| torchbench  |          tts_angular          |  0.9474  |         0.9525         |
| torchbench  |    nvidia_deeprecommender     |  0.8722  |         1.0186         |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0839         |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  1.0763  |         0.9079         |
| huggingface |     DebertaV2ForMaskedLM      |  0.9799  |         0.7579         |
| huggingface | DebertaV2ForQuestionAnswering |  0.9334  |         0.7713         |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.3388         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 174.4506 |        172.0176        |
| torchbench  |        phlippe_densenet        | 165.9498 |        166.9515        |
| torchbench  |           hf_BigBird           | 149.8271 |        129.4352        |
| torchbench  |       timm_efficientnet        | 145.6062 |        143.0498        |
| torchbench  |          densenet121           | 140.5917 |        137.6784        |
| torchbench  |       mobilenet_v3_large       | 133.3056 |        131.8468        |
| torchbench  |          mobilenet_v2          | 127.7121 |        126.6607        |
| torchbench  | timm_vision_transformer_large  |   nan    |        126.4063        |
| huggingface |     MobileBertForMaskedLM      | 148.2787 |        147.9057        |
| huggingface | MobileBertForQuestionAnswering | 142.1346 |        141.7778        |
| huggingface | DebertaV2ForQuestionAnswering  | 141.9495 |        73.3322         |
| huggingface |      DebertaV2ForMaskedLM      | 140.8307 |        75.1923         |
| huggingface |  MT5ForConditionalGeneration   | 135.0449 |        133.0555        |
| huggingface | M2M100ForConditionalGeneration | 135.0086 |        135.0663        |
| huggingface |        XGLMForCausalLM         | 131.8644 |        132.9075        |
| timm_models |           rexnet_100           | 285.6346 |        284.8706        |
| timm_models |           hrnet_w18            | 249.2625 |        243.8991        |
| timm_models |          ghostnet_100          | 233.6336 |        239.8671        |
| timm_models |           fbnetv3_b            | 170.6264 |        170.2552        |
| timm_models |          resnest101e           | 170.0193 |        169.3523        |
| timm_models |          mobilevit_s           | 167.8332 |        170.821         |
| timm_models |         pnasnet5large          | 161.8452 |        161.0501        |
| timm_models |     mobilenetv3_large_100      | 159.5911 |        160.6049        |
| timm_models |       gluon_inception_v3       | 158.6184 |        158.6029        |
| timm_models |          tf_mixnet_l           | 158.3847 |        159.2702        |
| timm_models |          inception_v3          | 157.4124 |        156.491         |
| timm_models |        adv_inception_v3        | 154.9877 |        161.8587        |
| timm_models |           tinynet_a            | 153.1664 |        158.0087        |
| timm_models |       res2net101_26w_4s        | 151.8029 |        151.4163        |
| timm_models |       tf_efficientnet_b0       | 151.0629 |        153.9675        |
| timm_models |        twins_pcpvt_base        | 150.3631 |        145.4156        |
| timm_models |            mixnet_l            | 149.0515 |        158.095         |
| timm_models |          spnasnet_100          | 139.397  |        131.1957        |
| timm_models |           fbnetc_100           | 135.6389 |        136.664         |
| timm_models |        mobilenetv2_100         | 134.0602 |        133.0476        |
| timm_models |      xcit_large_24_p8_224      | 133.5166 |        131.956         |
| timm_models |        res2net50_14w_8s        | 123.9931 |        122.8364        |
| timm_models |          mnasnet_100           | 119.6424 |        121.8537        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |              hf_GPT2_large              |  0.8904  |         1.128          |
| torchbench  |                 yolov3                  |  0.8701  |         1.0367         |
| torchbench  |            timm_efficientnet            |  0.8701  |         1.0073         |
| torchbench  |           speech_transformer            |  0.8651  |         0.869          |
| torchbench  |           shufflenet_v2_x1_0            |  0.8635  |         0.958          |
| torchbench  |              timm_resnest               |  0.8621  |         0.9665         |
| torchbench  |               Super_SloMo               |  0.8614  |         1.208          |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.8835         |
| torchbench  |           Background_Matting            |  0.8485  |         1.0414         |
| torchbench  |               timm_regnet               |  0.8481  |         0.9539         |
| torchbench  |              hf_DistilBert              |  0.8476  |         0.9945         |
| torchbench  |                resnet152                |  0.8471  |         0.9439         |
| torchbench  |                 hf_Bert                 |  0.8411  |         1.0258         |
| torchbench  |              hf_Bert_large              |  0.8302  |         1.0725         |
| torchbench  |               hf_T5_large               |  0.8201  |         1.168          |
| torchbench  |              pytorch_unet               |  0.8134  |         0.9308         |
| torchbench  |            phlippe_densenet             |  0.8058  |         0.8659         |
| torchbench  |                 hf_Bart                 |  0.7933  |         0.9173         |
| torchbench  |                  dcgan                  |  0.7821  |         0.9645         |
| torchbench  |                resnet50                 |  0.7811  |         0.8833         |
| torchbench  |                 demucs                  |  0.773   |         0.9655         |
| torchbench  |              squeezenet1_1              |  0.7722  |         0.908          |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.8893         |
| torchbench  |               timm_vovnet               |  0.7529  |         0.8869         |
| torchbench  |               mnasnet1_0                |  0.743   |         0.8074         |
| torchbench  |           mobilenet_v3_large            |  0.7279  |         0.8726         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9808         |
| torchbench  |               densenet121               |  0.7096  |         0.8017         |
| torchbench  |                 alexnet                 |  0.7091  |         0.939          |
| torchbench  |             pytorch_struct              |  0.697   |         0.7362         |
| torchbench  |               hf_BigBird                |  0.6949  |         1.1191         |
| torchbench  |             resnext50_32x4d             |   0.67   |         0.7709         |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.8931         |
| torchbench  |                   drq                   |  0.6379  |         0.9573         |
| torchbench  |            soft_actor_critic            |  0.6066  |         0.9973         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.7463         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.5904  |         0.6004         |
| torchbench  |                resnet18                 |  0.5395  |         0.6097         |
| torchbench  |              lennard_jones              |  0.5317  |         0.9997         |
| torchbench  |               hf_Reformer               |  0.4538  |         0.8022         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3169  |         0.3395         |
| huggingface |           ElectraForCausalLM            |  0.8941  |         0.9739         |
| huggingface |           PegasusForCausalLM            |  0.893   |         0.9864         |
| huggingface |          DistilBertForMaskedLM          |  0.8849  |         0.9624         |
| huggingface |            TrOCRForCausalLM             |  0.8836  |         0.9583         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8729  |         0.9803         |
| huggingface |     PegasusForConditionalGeneration     |  0.8689  |         1.0689         |
| huggingface |      MBartForConditionalGeneration      |  0.8672  |         1.0307         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         1.0139         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         1.0962         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8184  |         0.9119         |
| huggingface |         Speech2Text2ForCausalLM         |  0.789   |         0.8779         |
| huggingface |     M2M100ForConditionalGeneration      |  0.7651  |         0.9908         |
| huggingface |          MobileBertForMaskedLM          |  0.7473  |         1.016          |
| huggingface |             XGLMForCausalLM             |  0.7117  |         0.9792         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6569  |         0.8392         |
| huggingface |           DebertaForMaskedLM            |  0.5646  |         0.9988         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5187  |         0.9664         |
| huggingface |       DebertaForQuestionAnswering       |  0.4867  |         1.1525         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.4855  |          0.98          |
| timm_models |                hrnet_w18                |  0.8918  |          0.99          |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1115         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0171         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0171         |
| timm_models |              inception_v3               |  0.8904  |         1.0171         |
| timm_models |                 dpn107                  |  0.8833  |         0.9642         |
| timm_models |            gluon_xception65             |  0.8831  |         0.9705         |
| timm_models |              ghostnet_100               |  0.8807  |         0.977          |
| timm_models |              spnasnet_100               |  0.8786  |         0.9451         |
| timm_models |          mobilenetv3_large_100          |  0.877   |         0.9362         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1871         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0072         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9607         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9483         |
| timm_models |                mixnet_l                 |  0.8687  |         0.9902         |
| timm_models |               mnasnet_100               |  0.8683  |         0.9403         |
| timm_models |               res2next50                |  0.866   |         0.9547         |
| timm_models |              cait_m36_384               |  0.8632  |         0.989          |
| timm_models |               fbnetc_100                |  0.8596  |         0.9535         |
| timm_models |                pit_b_224                |  0.8578  |         1.0242         |
| timm_models |               selecsls42b               |  0.8576  |         0.9664         |
| timm_models |              convnext_base              |  0.8505  |         1.0338         |
| timm_models |                gernet_l                 |  0.8499  |         0.9706         |
| timm_models |         swsl_resnext101_32x16d          |  0.8461  |         0.9786         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0202         |
| timm_models |              botnet26t_256              |  0.8239  |         0.9779         |
| timm_models |          xcit_large_24_p8_224           |  0.8225  |         0.9732         |
| timm_models |                lcnet_050                |  0.805   |         0.884          |
| timm_models |                repvgg_a2                |  0.7738  |         0.9611         |
| timm_models |               regnety_002               |  0.7602  |         0.8966         |
| timm_models |             crossvit_9_240              |  0.7526  |         0.9898         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9045         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9604         |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/comp_time_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/passrate_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Performance speedup regressions

+------------------------+------+-------------+------------+
|        compiler        | name | prev_status | cur_status |
+------------------------+------+-------------+------------+
| inductor_no_cudagraphs | drq  |    1.015    |   0.9372   |
+------------------------+------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+-------------------+-------------+------------+
| compiler |       name        | prev_status | cur_status |
+----------+-------------------+-------------+------------+
| inductor | timm_efficientnet |   0.9291    |   0.8701   |
+----------+-------------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Performance speedup regressions

+------------------------+-------------------------------+-------------+------------+
|        compiler        |             name              | prev_status | cur_status |
+------------------------+-------------------------------+-------------+------------+
|        inductor        | DebertaV2ForQuestionAnswering |   0.9877    |   0.9334   |
| inductor_no_cudagraphs |      DebertaForMaskedLM       |   0.9726    |   0.9079   |
+------------------------+-------------------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9753 |  0.9244   |  3.6834  |          1.4           |
|           BERT_pytorch            |  16  | 1.0055 |  0.8191   |  3.1493  |         2.1735         |
|            densenet121            |  4   | 0.993  |  0.7009   |  2.8225  |         1.0985         |
|            hf_BigBird             |  2   | 0.9611 |   0.775   |  2.6777  |         1.7012         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9713 |  0.8962   |  2.6109  |         1.7816         |
|            hf_T5_large            |  2   | 1.0081 |  0.8375   |  2.4072  |         2.0172         |
|             hf_Albert             |  8   | 0.9977 |  0.9612   |  2.3514  |         2.3009         |
|         phlippe_densenet          | 128  | 0.9959 |  0.7779   |  2.1426  |         1.0425         |
|          pytorch_struct           | 200  | 0.921  |  0.7767   |  2.1158  |         1.1227         |
|        mobilenet_v3_large         |  32  | 1.0016 |  0.7911   |  2.0937  |         1.182          |
|               dlrm                | 1024 | 0.9332 |  0.8366   |  1.9673  |         1.1956         |
|               hf_T5               |  8   | 0.995  |   0.862   |  1.9617  |         2.0397         |
|           squeezenet1_1           |  32  | 0.9901 |  0.9373   |  1.9069  |         1.3411         |
|          phlippe_resnet           | 128  | 0.9861 |  0.7629   |  1.8744  |         1.0276         |
|              hf_GPT2              |  4   | 1.0195 |  0.9796   |  1.8652  |         1.9127         |
|              hf_Bert              |  4   | 1.0266 |   0.863   |  1.8355  |         1.7062         |
|          resnext50_32x4d          |  8   | 0.9856 |  0.7203   |  1.7552  |         0.9905         |
|            mnasnet1_0             |  32  | 0.9942 |   0.747   |  1.7292  |         1.0528         |
|           hf_GPT2_large           |  4   | 1.0001 |  0.9889   |  1.7271  |         1.7923         |
|              hf_Bart              |  4   | 0.9951 |  0.8417   |  1.6641  |         1.6147         |
|        speech_transformer         |  32  | 0.9832 |  0.8075   |  1.655   |         1.6411         |
|        shufflenet_v2_x1_0         | 128  | 0.998  |  0.7545   |  1.6327  |         1.2121         |
|             resnet18              |  16  | 0.9904 |  0.7734   |  1.6238  |         0.9849         |
|           hf_Bert_large           |  4   | 1.0325 |  0.8952   |  1.6127  |         1.653          |
|      timm_vision_transformer      |  32  | 0.9919 |  0.8593   |  1.6047  |         1.4341         |
|            timm_nfnet             | 128  | 0.9987 |  0.9977   |  1.5713  |         1.5033         |
|           timm_resnest            |  32  | 0.9973 |  0.8543   |  1.5639  |         1.5257         |
|           fastNLP_Bert            |  6   | 1.0009 |  0.8707   |  1.5514  |         1.5538         |
|                drq                |  1   | 0.9535 |  0.7363   |  1.5399  |         0.9372         |
|           mobilenet_v2            |  96  | 0.9991 |   0.779   |  1.5294  |         1.492          |
| attention_is_all_you_need_pytorch | 256  | 1.0006 |  0.9264   |  1.5249  |         1.5368         |
|           hf_DistilBert           |  8   | 1.0119 |  0.9459   |  1.4858  |         1.5228         |
|         timm_efficientnet         |  32  | 0.9495 |  0.6296   |  1.4651  |         1.1027         |
|           lennard_jones           | 1000 | 0.8939 |  0.7385   |  1.4268  |         0.9064         |
|               dcgan               |  32  | 0.8732 |  0.6948   |  1.3971  |         0.8366         |
|          LearningToPaint          |  96  | 0.9926 |  0.7781   |  1.3953  |         1.0886         |
|           pytorch_unet            |  1   | 0.9986 |   0.205   |  1.3614  |         1.3584         |
|          pytorch_stargan          |  16  | 0.9976 |  0.8147   |  1.2639  |         1.2497         |
|            Super_SloMo            |  6   | 0.9989 |  0.1775   |  1.2548  |         1.2344         |
|             resnet152             |  32  | 0.9992 |  0.7752   |  1.2524  |         1.0271         |
|               vgg16               |  64  | 0.9993 |  0.9985   |  1.241   |         1.2535         |
|        Background_Matting         |  4   | 0.9991 |  0.1371   |  1.2129  |         1.2096         |
|              yolov3               |  16  | 0.9994 |  0.8078   |  1.2018  |         1.2037         |
|             resnet50              |  32  | 0.9983 |  0.7852   |  1.1873  |         1.0727         |
|         soft_actor_critic         | 256  | 0.8509 |  0.6461   |  1.1611  |         0.7504         |
|            hf_Reformer            |  4   | 0.9857 |  0.9666   |  1.1472  |         1.0414         |
|              alexnet              | 128  | 0.9989 |  0.9973   |  1.0873  |         1.1373         |
|            timm_regnet            |  32  | 0.9314 |  0.7835   |  1.052   |         0.9833         |
|              demucs               |  4   | 0.9988 |  1.0013   |  1.0357  |         1.0391         |
|            timm_vovnet            |  32  | 0.8815 |  0.7247   |  0.9639  |         0.9578         |
|            tts_angular            |  64  | 0.9124 |  0.8886   |  0.9474  |         0.9525         |
|      nvidia_deeprecommender       | 256  | 0.9993 |  0.9985   |  0.8722  |         1.0186         |
|   timm_vision_transformer_large   |  32  | 0.9999 |    0.0    |   0.0    |         1.0839         |
|           hf_Longformer           |  2   | 1.0184 |  0.6915   |   0.0    |          0.0           |
|               moco                |  32  | 0.9786 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|          vision_maskrcnn          |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 26.3223 |  53.9862  | 174.4506 |        172.0176        |
|         phlippe_densenet          | 128  |  3.257  |  7.0182   | 165.9498 |        166.9515        |
|            hf_BigBird             |  2   | 12.7496 |  36.349   | 149.8271 |        129.4352        |
|         timm_efficientnet         |  32  | 5.0364  |  10.1933  | 145.6062 |        143.0498        |
|            densenet121            |  4   | 7.6511  |  18.0051  | 140.5917 |        137.6784        |
|        mobilenet_v3_large         |  32  | 3.6654  |  7.6004   | 133.3056 |        131.8468        |
|           mobilenet_v2            |  96  | 3.2925  |   6.96    | 127.7121 |        126.6607        |
|              yolov3               |  16  | 4.8722  |  10.5376  | 119.3765 |        119.0635        |
|            mnasnet1_0             |  32  | 3.1127  |  6.7025   | 109.1217 |        105.3123        |
|             resnet152             |  32  | 9.0916  |  19.9988  | 106.4938 |        105.1953        |
|           hf_GPT2_large           |  4   | 14.4175 |  29.3091  | 104.609  |        105.8022        |
|           timm_resnest            |  32  | 1.7937  |    3.9    | 97.3027  |        98.4701         |
|        shufflenet_v2_x1_0         | 128  | 3.4393  |  7.6103   | 82.8259  |        81.5473         |
|        speech_transformer         |  32  | 5.8649  |  13.4598  | 78.0017  |        77.2421         |
| attention_is_all_you_need_pytorch | 256  | 4.3497  |  10.8673  | 74.6543  |        73.9992         |
|            timm_regnet            |  32  | 6.6666  |  12.0641  | 73.7382  |        72.5488         |
|            timm_nfnet             | 128  | 5.8813  |  10.9034  | 72.7689  |        71.4684         |
|        Background_Matting         |  4   | 3.0414  |  11.3631  | 71.6367  |        69.0831         |
|           BERT_pytorch            |  16  | 4.8113  |  11.4244  | 70.5861  |         68.784         |
|             resnet50              |  32  |  3.198  |  6.8618   | 67.1021  |        63.8794         |
|           hf_Bert_large           |  4   | 10.2899 |  21.1138  | 63.3984  |        64.6375         |
|            timm_vovnet            |  32  | 3.5332  |  6.4502   | 62.7582  |        63.4345         |
|           pytorch_unet            |  1   | 1.5272  |  4.5825   | 58.1347  |        59.8242         |
|       functorch_dp_cifar10        |  64  | 1.1955  |  2.3819   | 55.1974  |        55.3341         |
|          resnext50_32x4d          |  8   | 3.2417  |  6.9535   | 53.4524  |        50.8805         |
|      timm_vision_transformer      |  32  | 3.2717  |  7.5705   | 52.2523  |        49.9775         |
|               hf_T5               |  8   | 5.7658  |  13.187   | 51.8276  |        49.8493         |
|           fastNLP_Bert            |  6   | 5.1585  |  11.2384  | 51.4163  |        49.7248         |
|              hf_Bart              |  4   | 6.2747  |  14.3583  | 48.4436  |        49.5041         |
|            hf_Reformer            |  4   | 4.0704  |  5.8811   | 46.2695  |        43.4207         |
|          pytorch_stargan          |  16  | 1.1921  |  3.1814   | 45.8497  |        46.4156         |
|          LearningToPaint          |  96  | 1.3772  |  2.9011   | 45.1074  |        44.5473         |
|            Super_SloMo            |  6   | 2.8059  |   9.569   |  44.575  |        43.7284         |
|             resnet18              |  16  |  1.335  |  2.8514   | 44.2679  |        43.8527         |
|              hf_GPT2              |  4   | 4.7221  |  9.4415   |  42.889  |        42.5886         |
|             hf_Albert             |  8   | 2.5593  |  8.4523   | 38.8372  |        38.2513         |
|              hf_Bert              |  4   | 5.1342  |  10.6119  | 38.4864  |        38.6684         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2044  |  2.8967   | 37.1343  |        34.7253         |
|          phlippe_resnet           | 128  | 1.3344  |  2.8086   |  34.301  |        32.0524         |
|           hf_DistilBert           |  8   | 2.4517  |  5.5214   | 31.2326  |        31.1154         |
|              demucs               |  4   | 1.4047  |  2.1335   | 30.8586  |        30.5401         |
|           squeezenet1_1           |  32  | 1.0854  |  1.7215   | 24.0147  |        24.3868         |
|          pytorch_struct           | 200  | 0.7751  |  1.3202   | 21.9903  |        20.4217         |
|               vgg16               |  64  | 0.6282  |  1.1148   | 16.4355  |        16.1471         |
|              alexnet              | 128  | 0.4804  |  0.7643   | 15.7429  |        14.7025         |
|      nvidia_deeprecommender       | 256  | 0.4773  |  0.7511   | 10.6226  |         9.897          |
|                drq                |  1   |  0.667  |  0.9986   |  9.2147  |        11.0454         |
|         soft_actor_critic         | 256  | 0.4211  |   0.592   |  8.1238  |         8.8881         |
|               dcgan               |  32  | 0.4222  |  0.6993   |  7.937   |         7.6838         |
|               dlrm                | 1024 | 0.3602  |   0.778   |  7.7075  |         7.506          |
|            tts_angular            |  64  |  0.44   |  0.5074   |  6.6644  |         6.4475         |
|           lennard_jones           | 1000 | 0.3906  |  0.5875   |  6.4174  |         6.706          |
|   timm_vision_transformer_large   |  32  | 9.2777  |    nan    |   nan    |        126.4063        |
|           hf_Longformer           |  2   | 9.5309  |  30.3233  |   nan    |          nan           |
|               moco                |  32  | 34.4388 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0378  |         1.2557         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9858 |   0.765   |  1.0104  |         1.1028         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9895  |         0.9983         |
|            timm_nfnet             | 128  | 0.9071 |  0.8752   |  0.9694  |         1.0726         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  0.9575  |         1.1593         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9321  |         1.0713         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8904  |         1.128          |
|              yolov3               |  16  | 0.9837 |  0.8253   |  0.8701  |         1.0367         |
|         timm_efficientnet         |  32  | 0.9856 |  0.7654   |  0.8701  |         1.0073         |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.869          |
|        shufflenet_v2_x1_0         | 128  | 0.9549 |  0.8387   |  0.8635  |         0.958          |
|           timm_resnest            |  32  | 0.989  |  0.8836   |  0.8621  |         0.9665         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  0.8614  |         1.208          |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  0.8485  |         1.0414         |
|            timm_regnet            |  32  |  0.99  |  0.8505   |  0.8481  |         0.9539         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9945         |
|             resnet152             |  32  | 0.9949 |  0.8939   |  0.8471  |         0.9439         |
|              hf_Bert              |  4   | 0.9645 |  0.8338   |  0.8411  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.8302  |         1.0725         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|              hf_Bart              |  4   | 0.9087 |  0.7524   |  0.7933  |         0.9173         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9919 |  0.8603   |  0.7811  |         0.8833         |
|              demucs               |  4   | 0.966  |  0.9659   |  0.773   |         0.9655         |
|           squeezenet1_1           |  32  | 0.9695 |  0.9321   |  0.7722  |         0.908          |
|          pytorch_stargan          |  16  | 0.9914 |  0.9688   |  0.7715  |         0.8893         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|            mnasnet1_0             |  32  | 0.9789 |  0.8641   |  0.743   |         0.8074         |
|        mobilenet_v3_large         |  32  | 0.9767 |  0.9444   |  0.7279  |         0.8726         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|            densenet121            |  4   | 0.9935 |  0.9821   |  0.7096  |         0.8017         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.7091  |         0.939          |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.697   |         0.7362         |
|            hf_BigBird             |  2   | 0.9493 |  0.9257   |  0.6949  |         1.1191         |
|          resnext50_32x4d          |  8   | 0.9944 |  0.8435   |   0.67   |         0.7709         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8796   |  0.5904  |         0.6004         |
|             resnet18              |  16  | 0.9753 |  0.7978   |  0.5395  |         0.6097         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 0.9982 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 208.8152 | 211.1975  | 120.848  |        116.6056        |
|        Background_Matting         |  4   | 125.9714 | 919.5735  | 103.8619 |        103.9068        |
|            hf_T5_large            |  2   | 219.1325 | 264.4236  | 94.0587  |        108.636         |
|               hf_T5               |  8   | 180.2014 | 210.4385  | 92.1834  |        89.0432         |
|            timm_nfnet             | 128  | 118.2195 | 118.4023  | 75.3438  |        78.7927         |
|            hf_BigBird             |  2   | 226.7344 | 281.0304  | 73.3706  |        111.1306        |
|            hf_Reformer            |  4   | 82.1967  |  83.7773  | 70.6434  |        78.8052         |
|            Super_SloMo            |  6   | 79.4314  | 446.6189  | 63.3651  |        64.3071         |
|              yolov3               |  16  | 68.6391  |  84.7796  | 57.0297  |        56.9665         |
|            timm_regnet            |  32  | 59.6193  |  70.7706  | 55.3015  |        56.2733         |
|               vgg16               |  64  | 66.2542  |  66.3203  |  53.548  |        52.9321         |
|             resnet152             |  32  | 66.1924  |  81.0745  | 53.5258  |        61.8231         |
|              demucs               |  4   | 53.5096  |  53.2469  |  51.562  |        51.3458         |
|           hf_Bert_large           |  4   | 80.7111  |  91.8463  | 50.7232  |        49.9302         |
| attention_is_all_you_need_pytorch | 256  | 54.1383  |  58.4197  | 35.4568  |        35.6529         |
|        speech_transformer         |  32  | 67.2652  |  70.969   | 34.5578  |        33.7847         |
|              hf_Bart              |  4   | 85.6144  |  91.4511  | 34.0837  |        35.4215         |
|           fastNLP_Bert            |  6   | 51.8311  |  59.778   | 33.4295  |        33.9115         |
|           mobilenet_v2            |  96  | 47.0622  |  60.1988  | 30.7143  |        31.5219         |
|           pytorch_unet            |  1   | 39.8693  | 194.3051  | 29.2496  |        29.2931         |
|             hf_Albert             |  8   | 69.9744  |  72.3888  | 28.9908  |        29.6988         |
|              hf_GPT2              |  4   | 52.2706  |  49.4579  | 25.9536  |        25.7117         |
|            timm_vovnet            |  32  | 28.1774  |  34.298   | 25.5801  |        26.2498         |
|         timm_efficientnet         |  32  |  34.158  |  51.4067  | 21.9645  |        29.1067         |
|             resnet50              |  32  | 26.0536  |  33.2305  | 21.9637  |        24.4027         |
|              hf_Bert              |  4   | 39.9004  |  46.7464  | 21.7272  |        23.8849         |
|           hf_DistilBert           |  8   | 33.2414  |  35.2106  | 21.0421  |        21.1044         |
|            densenet121            |  4   | 60.0621  |  84.619   | 19.3795  |        49.5011         |
|        shufflenet_v2_x1_0         | 128  |  31.911  |  42.042   | 18.5438  |        25.2698         |
|      timm_vision_transformer      |  32  | 28.5393  |  36.9142  | 17.9695  |        19.7386         |
|           BERT_pytorch            |  16  | 52.5079  |  77.7409  |  17.001  |        26.2833         |
|           timm_resnest            |  32  | 24.1173  |  28.1523  | 15.3229  |        15.7329         |
|            mnasnet1_0             |  32  | 21.9118  |  29.3639  | 12.9914  |         22.12          |
|        mobilenet_v3_large         |  32  | 29.4408  |  35.9987  |  12.71   |        24.1097         |
|      nvidia_deeprecommender       | 256  | 10.2413  |  10.2362  | 11.7023  |        10.0453         |
|          pytorch_stargan          |  16  | 14.7188  |  18.0579  | 11.5465  |        11.7929         |
|          resnext50_32x4d          |  8   | 20.1533  |  30.7042  | 11.5115  |        20.1414         |
|         phlippe_densenet          | 128  | 25.6266  |  30.2034  | 11.2082  |         22.189         |
|              alexnet              | 128  |  9.8246  |  9.8544   |  9.0258  |         8.6414         |
|          LearningToPaint          |  96  | 11.1176  |  14.8753  |  8.5447  |        10.2363         |
|            tts_angular            |  64  |  6.8262  |  7.0801   |  6.6617  |         6.589          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 13.8942  |  16.9145  |  5.7256  |         7.6244         |
|             resnet18              |  16  |  9.7547  |  11.5765  |  5.5427  |         9.3552         |
|           squeezenet1_1           |  32  | 11.9098  |  11.5857  |  5.3696  |         7.5482         |
|          phlippe_resnet           | 128  |  9.1483  |  11.7927  |  4.9543  |         8.8205         |
|       functorch_dp_cifar10        |  64  | 10.5582  |  11.1678  |  2.771   |         7.2336         |
|          pytorch_struct           | 200  |  5.9276  |  6.0477   |  2.5499  |         4.0995         |
|               dlrm                | 1024 |  4.4027  |  5.6721   |  2.1352  |         3.541          |
|                drq                |  1   |  3.3178  |  4.5079   |  2.0671  |         4.1028         |
|               dcgan               |  32  |  2.4002  |  3.0742   |  1.4909  |         2.4882         |
|         soft_actor_critic         | 256  |  1.8053  |  2.3954   |  1.3678  |         3.0809         |
|           lennard_jones           | 1000 |  1.8537  |  2.4392   |  1.1152  |         1.7444         |
|   timm_vision_transformer_large   |  32  | 464.0222 |    nan    |   nan    |        427.9489        |
|           hf_Longformer           |  2   | 111.1275 | 162.8063  |   nan    |          nan           |
|               moco                |  32  | 50.1939  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 1.0195 |  0.8562   |  3.4643  |         1.2064         |
|     MobileBertForQuestionAnswering      | 128 | 1.0178 |  0.8577   |  2.8401  |         1.1853         |
|             OPTForCausalLM              |  2  | 0.993  |  0.9014   |  2.4776  |         2.4656         |
|      GPT2ForSequenceClassification      |  4  | 0.9881 |   0.964   |  2.321   |          2.36          |
|       MT5ForConditionalGeneration       | 16  | 1.0138 |  0.8733   |  2.2439  |         2.2427         |
|             XGLMForCausalLM             |  8  | 1.0045 |  0.8433   |  2.2052  |         1.5287         |
|       ElectraForQuestionAnswering       | 64  | 0.9977 |  0.9876   |  2.1795  |         2.1601         |
|    LayoutLMForSequenceClassification    | 16  | 0.997  |  0.9832   |  1.8469  |         1.8336         |
|           ElectraForCausalLM            | 32  | 0.9965 |  0.9502   |  1.8437  |         1.8686         |
|            XLNetLMHeadModel             |  8  | 0.9975 |  0.9701   |  1.8234  |         1.8211         |
|        BertForQuestionAnswering         | 16  | 0.9972 |  0.9829   |  1.8059  |         1.8066         |
|       RobertaForQuestionAnswering       | 16  | 0.9971 |  0.9827   |  1.8016  |         1.8076         |
|           RobertaForCausalLM            | 16  | 0.9974 |   0.973   |  1.6811  |         1.6995         |
|       T5ForConditionalGeneration        |  4  | 0.9949 |  0.8611   |  1.6792  |         1.7729         |
|                 T5Small                 |  4  | 0.9944 |  0.8612   |  1.6745  |         1.7703         |
|               DistillGPT2               | 16  | 0.9931 |  0.9601   |  1.6743  |         1.7177         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9976 |  0.9778   |  1.6534  |         1.6782         |
|            PLBartForCausalLM            |  8  | 0.9935 |   0.961   |  1.6498  |         1.6387         |
|       AlbertForQuestionAnswering        |  4  | 1.0003 |   0.886   |  1.6461  |         1.6444         |
|     PLBartForConditionalGeneration      |  4  | 0.992  |  0.9507   |  1.6442  |         1.6767         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.8854   |  1.6362  |         1.6375         |
|           LayoutLMForMaskedLM           | 16  | 0.9977 |  0.9736   |  1.6318  |         1.6376         |
|             BertForMaskedLM             | 16  | 0.9979 |  0.9725   |  1.5998  |         1.6163         |
|      MBartForConditionalGeneration      |  2  | 1.0096 |   0.971   |  1.5885  |         1.4981         |
|                CamemBert                | 16  | 0.9979 |  0.9729   |  1.5463  |         1.5619         |
|         Speech2Text2ForCausalLM         | 256 | 0.983  |  0.9271   |  1.5427  |         1.5788         |
|             BartForCausalLM             |  4  | 0.9901 |  0.9602   |  1.534   |         1.5636         |
|            MBartForCausalLM             |  4  | 0.9874 |   0.953   |  1.528   |         1.5575         |
|            YituTechConvBert             | 16  | 0.9974 |  0.9702   |  1.5269  |         1.5232         |
|         MegatronBertForCausalLM         |  4  | 1.0212 |  0.9522   |  1.5235  |         1.5719         |
|     M2M100ForConditionalGeneration      | 16  | 1.0022 |  0.8411   |  1.4824  |         1.535          |
|     PegasusForConditionalGeneration     | 32  | 1.0058 |  0.9558   |  1.4751  |         1.4313         |
|      BartForConditionalGeneration       |  2  | 1.0039 |  0.9761   |  1.4706  |         1.5071         |
|     DistilBertForQuestionAnswering      | 256 | 0.9969 |  0.9909   |  1.4585  |         1.4556         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0063 |  0.9246   |  1.3852  |         1.4696         |
|           PegasusForCausalLM            | 32  | 0.9842 |  0.9315   |  1.2835  |         1.2191         |
|            TrOCRForCausalLM             | 32  | 0.9878 |  0.9572   |  1.2668  |         1.2975         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9922 |  0.9158   |  1.2638  |         1.3191         |
|          DistilBertForMaskedLM          | 128 | 0.9966 |  0.9547   |  1.2162  |         1.2396         |
|       DebertaForQuestionAnswering       |  8  | 0.8501 |  0.7351   |  1.2059  |         1.0756         |
|           DebertaForMaskedLM            |  4  | 0.7522 |  0.5904   |  1.0763  |         0.9079         |
|          DebertaV2ForMaskedLM           |  1  | 0.7195 |   0.543   |  0.9799  |         0.7579         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7239 |  0.5506   |  0.9334  |         0.7713         |
|          BlenderbotForCausalLM          |  4  | 0.9931 |  0.8468   |   0.0    |         1.3388         |
|          AllenaiLongformerBase          |  4  | 1.0088 |  0.6714   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 17.3666 |  39.4042  | 148.2787 |        147.9057        |
|     MobileBertForQuestionAnswering      | 128 | 17.3129 |  38.1963  | 142.1346 |        141.7778        |
|      DebertaV2ForQuestionAnswering      |  2  | 15.2369 |  26.2168  | 141.9495 |        73.3322         |
|          DebertaV2ForMaskedLM           |  1  | 15.4179 |  26.7342  | 140.8307 |        75.1923         |
|       MT5ForConditionalGeneration       | 16  | 7.9059  |  17.7804  | 135.0449 |        133.0555        |
|     M2M100ForConditionalGeneration      | 16  | 11.8418 |  25.8836  | 135.0086 |        135.0663        |
|             XGLMForCausalLM             |  8  |   9.2   |  19.9171  | 131.8644 |        132.9075        |
|            XLNetLMHeadModel             |  8  | 10.1615 |  26.4244  | 94.3176  |        93.6523         |
|           DebertaForMaskedLM            |  4  | 7.3358  |  13.4886  | 87.7665  |        56.2308         |
|      MBartForConditionalGeneration      |  2  | 11.6805 |  25.1676  | 83.0373  |        79.4957         |
|       DebertaForQuestionAnswering       |  8  | 7.0921  |  13.1489  | 82.5549  |        56.1259         |
|      BartForConditionalGeneration       |  2  | 11.6283 |  25.5516  | 74.8694  |        75.9547         |
|            YituTechConvBert             | 16  | 7.1656  |  15.1669  | 70.1375  |        69.1153         |
|     PegasusForConditionalGeneration     | 32  | 5.2866  |  19.0067  | 69.1199  |        67.3269         |
|    MegatronBertForQuestionAnswering     |  8  | 10.6263 |  20.7932  | 66.8776  |        67.0001         |
|         MegatronBertForCausalLM         |  4  | 10.6571 |  20.7623  | 66.1344  |        65.7703         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.6827  |  16.6857  | 55.2142  |        54.9011         |
|                 T5Small                 |  4  | 5.4213  |  12.2361  | 52.2729  |        51.6669         |
|       T5ForConditionalGeneration        |  4  | 5.4754  |  12.4412  | 50.8089  |        51.4302         |
|           ElectraForCausalLM            | 32  | 5.1149  |  10.663   | 50.0812  |        52.9436         |
|     PLBartForConditionalGeneration      |  4  | 6.1492  |  13.1287  | 49.3152  |        48.5554         |
|    LayoutLMForSequenceClassification    | 16  | 5.4168  |  10.7946  | 46.0091  |        47.0243         |
|       ElectraForQuestionAnswering       | 64  | 5.0955  |  10.596   | 42.9032  |        45.9301         |
|           LayoutLMForMaskedLM           | 16  | 5.5586  |  11.0482  | 42.5263  |        39.5133         |
|            MBartForCausalLM             |  4  | 5.5491  |  10.7453  | 41.0287  |        40.3365         |
|           PegasusForCausalLM            | 32  | 5.6175  |  10.8135  | 39.5827  |        38.5885         |
|             BartForCausalLM             |  4  | 5.5383  |  10.8355  | 38.9832  |        37.9799         |
|             OPTForCausalLM              |  2  | 4.9301  |  9.8462   | 38.6318  |        38.1699         |
|           RobertaForCausalLM            | 16  | 5.2773  |  10.4999  | 38.4312  |        38.3016         |
|             BertForMaskedLM             | 16  | 5.2378  |  10.4496  | 38.3115  |        40.6351         |
|        BertForQuestionAnswering         | 16  | 5.2077  |  10.376   | 37.7579  |        37.9553         |
|       RobertaForQuestionAnswering       | 16  |  5.216  |  10.3456  | 37.7422  |         37.152         |
|            TrOCRForCausalLM             | 32  | 5.8201  |  10.5112  |  37.581  |        37.8108         |
|            AlbertForMaskedLM            |  4  | 2.3025  |  7.8446   | 37.4184  |        36.4355         |
|      GPT2ForSequenceClassification      |  4  | 4.6258  |  9.5679   | 37.0521  |        35.1144         |
|     DistilBertForQuestionAnswering      | 256 |  2.431  |  5.3153   | 36.8535  |        35.5875         |
|                CamemBert                | 16  | 5.1122  |  10.5485  | 36.6909  |        38.5624         |
|          DistilBertForMaskedLM          | 128 | 2.4415  |  5.2637   | 34.8002  |        34.5557         |
|       AlbertForQuestionAnswering        |  4  | 2.1284  |  7.8649   | 33.7088  |        33.0221         |
|               DistillGPT2               | 16  | 2.4242  |  4.9348   | 29.2212  |        28.4654         |
|       BlenderbotSmallForCausalLM        | 64  | 3.8305  |  7.3828   | 28.3444  |        29.4318         |
|            PLBartForCausalLM            |  8  | 2.9785  |   5.745   | 26.8604  |        26.3885         |
|         Speech2Text2ForCausalLM         | 256 |  2.928  |  5.5414   | 26.4104  |        25.3067         |
|          BlenderbotForCausalLM          |  4  | 11.0822 |  21.4456  |   nan    |        68.9951         |
|          AllenaiLongformerBase          |  4  | 9.5285  |   30.55   |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1376  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.094   |         1.1346         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0607  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0603  |         1.1724         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0077  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0075  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0035  |         1.0491         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  0.9911  |         1.0411         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9729  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9649  |         1.052          |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9501  |         1.268          |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9281  |         0.9912         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9138  |         0.9886         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9137  |         0.9749         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0018         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.8941  |         0.9739         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.893   |         0.9864         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8836  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.8689  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8184  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.789   |         0.8779         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7651  |         0.9908         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7117  |         0.9792         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9143   |  0.5646  |         0.9988         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5187  |         0.9664         |
|       DebertaForQuestionAnswering       |  8  | 0.9506 |  1.0516   |  0.4867  |         1.1525         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9762   |  0.4855  |          0.98          |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8694   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 265.9869 | 300.5623  | 162.9705 |        162.677         |
|       AlbertForQuestionAnswering        |  4  | 263.8994 | 297.7492  | 160.609  |        160.6897        |
|            XLNetLMHeadModel             |  8  |  279.88  | 287.2337  | 153.4296 |        152.9245        |
|      DebertaV2ForQuestionAnswering      |  2  | 144.1794 |  187.943  | 113.4631 |        155.4833        |
|     PegasusForConditionalGeneration     | 32  | 137.9125 | 151.6774  | 112.1673 |        109.2848        |
|            TrOCRForCausalLM             | 32  | 139.7954 | 143.3243  | 108.8857 |        106.3656        |
|          DebertaV2ForMaskedLM           |  1  | 143.3549 | 185.1753  | 104.6545 |        151.7598        |
|      MBartForConditionalGeneration      |  2  | 138.6256 | 141.7731  | 94.5828  |         92.466         |
|      BartForConditionalGeneration       |  2  | 138.3384 | 141.0143  | 93.2536  |        95.6514         |
|    MegatronBertForQuestionAnswering     |  8  | 142.1768 | 144.8373  |  85.802  |        84.5301         |
|            YituTechConvBert             | 16  | 125.4137 | 128.9992  |  82.425  |         82.192         |
| BlenderbotSmallForConditionalGeneration | 64  | 111.2764 | 118.9681  | 80.1643  |        82.8079         |
|                CamemBert                | 16  | 118.4919 | 121.5609  | 76.4312  |        76.0448         |
|            MBartForCausalLM             |  4  | 114.8213 | 120.0369  | 74.7472  |        72.8318         |
|             BartForCausalLM             |  4  | 114.4815 | 118.0206  | 74.3957  |        72.5389         |
|     M2M100ForConditionalGeneration      | 16  | 113.956  | 135.1671  | 74.3152  |         93.921         |
|     PLBartForConditionalGeneration      |  4  | 118.5638 | 123.4187  | 71.7851  |        70.8875         |
|     DistilBertForQuestionAnswering      | 256 | 103.5783 | 104.2488  |  71.273  |         71.156         |
|            PLBartForCausalLM            |  8  | 113.0747 | 119.6607  | 69.8614  |        68.4604         |
|     MobileBertForQuestionAnswering      | 128 | 170.0968 | 191.2733  | 69.8209  |        147.5787        |
|           LayoutLMForMaskedLM           | 16  | 112.7764 | 115.4758  | 69.6405  |        68.7268         |
|          DistilBertForMaskedLM          | 128 | 84.8406  |  88.5976  | 69.6352  |        68.3574         |
|             BertForMaskedLM             | 16  | 110.2723 |  113.044  | 68.7531  |        68.2267         |
|             OPTForCausalLM              |  2  | 169.8167 | 183.3626  | 68.7153  |        68.2441         |
|           RobertaForCausalLM            | 16  | 115.3867 | 118.0369  | 68.4416  |        67.8338         |
|               DistillGPT2               | 16  | 106.4543 | 109.9697  | 63.1095  |         61.547         |
|       DebertaForQuestionAnswering       |  8  | 88.9652  | 102.7284  | 63.0224  |        70.1792         |
|                 T5Small                 |  4  | 104.9496 |  121.174  | 62.7643  |         58.941         |
|       T5ForConditionalGeneration        |  4  | 104.8781 | 121.1378  | 62.6906  |        58.8505         |
|          MobileBertForMaskedLM          | 64  | 172.3698 | 196.4705  | 61.0183  |        152.6089        |
|           PegasusForCausalLM            | 32  | 70.2048  |  74.0403  | 58.2727  |        56.6915         |
|           DebertaForMaskedLM            |  4  | 83.5826  | 105.3769  | 57.5818  |         67.781         |
|         MegatronBertForCausalLM         |  4  | 85.8583  |  90.3931  | 56.6414  |         55.522         |
|             XGLMForCausalLM             |  8  | 92.5786  | 108.0371  | 53.8599  |         76.732         |
|       RobertaForQuestionAnswering       | 16  | 95.8137  |  97.1421  | 53.2101  |        53.0819         |
|    LayoutLMForSequenceClassification    | 16  | 97.9985  |  99.1749  | 52.9143  |        53.4914         |
|        BertForQuestionAnswering         | 16  | 95.5363  |  96.6644  | 52.6706  |        52.6222         |
|       ElectraForQuestionAnswering       | 64  | 114.8733 | 115.8892  | 52.5392  |        54.0331         |
|           ElectraForCausalLM            | 32  | 88.3255  |  92.4038  | 47.7505  |        47.3605         |
|       BlenderbotSmallForCausalLM        | 64  | 58.5682  |  63.176   | 46.5461  |        46.8624         |
|       MT5ForConditionalGeneration       | 16  | 91.9154  | 103.7795  |  41.672  |        47.2741         |
|      GPT2ForSequenceClassification      |  4  | 92.5996  |  94.9264  | 39.3681  |        38.6928         |
|         Speech2Text2ForCausalLM         | 256 | 53.6667  |  56.2251  | 34.8522  |        34.3617         |
|          BlenderbotForCausalLM          |  4  | 110.2872 | 127.4515  |   nan    |        87.8328         |
|          AllenaiLongformerBase          |  4  | 179.945  | 270.0401  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9998 |  0.9986   |  3.0216  |         2.9852         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8826   |  2.0711  |         1.6294         |
|        twins_pcpvt_base         | 64  | 1.0042 |  0.9095   |  1.9991  |         1.7149         |
|         coat_lite_mini          | 128 |  1.0   |  0.9981   |  1.9518  |         1.9288         |
|          gmlp_s16_224           | 128 | 0.9999 |  1.0901   |  1.8701  |         1.8514         |
|          ghostnet_100           | 128 | 0.999  |   0.768   |  1.8685  |         1.6027         |
|          gmixer_24_224          | 128 | 0.999  |  0.8931   |  1.7788  |         1.7648         |
|           volo_d1_224           | 64  | 0.9996 |  0.9782   |  1.7035  |         1.6824         |
|            lcnet_050            | 128 | 0.9451 |   0.738   |  1.6989  |         1.4835         |
|         crossvit_9_240          | 128 | 0.9997 |  0.7891   |  1.6649  |         1.6433         |
|  swin_base_patch4_window7_224   | 64  | 0.9994 |  0.9643   |  1.6385  |         1.6304         |
|           convit_base           | 64  | 0.9999 |  0.9997   |  1.6174  |         1.6162         |
|       gluon_inception_v3        | 128 | 0.9998 |   0.868   |  1.5404  |         1.5301         |
|          inception_v3           | 128 | 0.9996 |  0.8667   |  1.539   |         1.5298         |
|        adv_inception_v3         | 128 | 0.9997 |  0.8626   |  1.5386  |         1.5272         |
|             dla102              | 128 | 0.9998 |  0.8177   |  1.5358  |         1.5337         |
|          convnext_base          | 64  | 0.9992 |  1.0012   |  1.526   |         1.5068         |
|        sebotnet33ts_256         | 64  | 0.9651 |  0.7694   |  1.5245  |         1.5567         |
|           dm_nfnet_f0           | 128 | 0.9989 |  0.9982   |  1.5024  |         1.4557         |
|            nfnet_l0             | 128 | 0.9994 |  0.8207   |  1.4996  |         1.4506         |
|       eca_botnext26ts_256       | 128 | 0.9781 |  0.7222   |  1.4543  |         1.4367         |
|           mobilevit_s           | 64  | 0.9714 |  0.7359   |  1.4493  |         1.4629         |
|            pit_b_224            | 64  | 0.9996 |  0.9977   |  1.4441  |         1.4385         |
|           mnasnet_100           | 128 | 0.9508 |   0.742   |  1.4414  |         1.498          |
|           resnest101e           | 64  | 0.9997 |  0.8715   |  1.4389  |         1.3686         |
|      mobilenetv3_large_100      | 128 | 0.9523 |  0.7626   |  1.4374  |         1.4564         |
|           regnety_002           | 128 | 0.9659 |  0.7235   |  1.4369  |         1.2715         |
|          botnet26t_256          | 128 | 0.9769 |  0.8548   |  1.4148  |         1.4323         |
|           selecsls42b           | 128 | 0.9991 |  0.8133   |  1.4135  |         1.4159         |
|         mobilenetv2_100         | 128 | 0.9517 |  0.7389   |  1.3951  |         1.4491         |
|          jx_nest_base           | 32  | 0.9995 |  0.9982   |  1.3916  |         1.3833         |
|        res2net50_14w_8s         | 128 | 0.9994 |  0.7903   |  1.3839  |         1.3609         |
|           res2next50            | 128 | 0.9999 |  0.8263   |  1.3733  |         1.3672         |
|        ese_vovnet19b_dw         | 128 | 0.9658 |  0.8384   |  1.3667  |         1.3893         |
|          mixer_b16_224          | 128 | 0.9996 |  1.0207   |  1.3654  |         1.3645         |
|            hrnet_w18            | 128 | 0.9982 |  0.6388   |  1.3638  |         1.3047         |
|          cait_m36_384           |  4  | 1.0003 |  0.9989   |  1.3617  |         1.3675         |
|          spnasnet_100           | 128 | 0.9441 |  0.7398   |  1.3586  |         1.4228         |
|      beit_base_patch16_224      | 64  | 0.9993 |  0.9696   |  1.3578  |         1.3577         |
|       tf_efficientnet_b0        | 128 | 0.9642 |  0.6835   |  1.3557  |         1.3938         |
|         poolformer_m36          | 64  | 0.9999 |  0.9964   |  1.3524  |         1.3418         |
|           fbnetc_100            | 128 | 0.9518 |  0.7397   |  1.3515  |         1.4089         |
|           rexnet_100            | 128 | 0.9609 |  0.7083   |  1.3141  |         1.3516         |
|            fbnetv3_b            | 128 | 0.9524 |  0.7714   |  1.3134  |         1.3459         |
|          resmlp_12_224          | 128 | 0.9999 |  0.8952   |  1.2749  |         1.2678         |
| deit_base_distilled_patch16_224 | 64  | 0.9995 |  0.9974   |  1.2612  |         1.2614         |
|          cspdarknet53           | 64  | 0.9438 |  0.7928   |  1.2429  |         1.2816         |
|      vit_base_patch16_224       | 64  | 0.9995 |  0.9968   |  1.2406  |         1.2402         |
|            tinynet_a            | 128 | 0.9515 |  0.6805   |  1.2347  |         1.274          |
|           tf_mixnet_l           | 128 | 0.9812 |  0.8301   |  1.1922  |         1.1993         |
|            mixnet_l             | 128 | 0.9803 |  0.8238   |  1.1807  |         1.1871         |
|         visformer_small         | 128 | 0.9993 |  0.9478   |  1.1777  |         1.1699         |
|        res2net101_26w_4s        | 64  | 1.0011 |  0.7977   |  1.1605  |         1.0805         |
|          pnasnet5large          | 16  | 0.9975 |  0.9227   |  1.1082  |         1.1307         |
|             dpn107              | 32  | 0.9397 |  0.8131   |  1.1026  |         1.1487         |
|            repvgg_a2            | 128 | 0.9428 |  0.7607   |  1.0978  |         1.1301         |
|        gluon_xception65         | 32  | 0.9995 |   0.848   |  1.0843  |         1.0888         |
|     swsl_resnext101_32x16d      | 32  | 0.9991 |  0.8429   |  1.0598  |         1.0237         |
|            gernet_l             | 128 | 0.9425 |  0.8001   |  1.0477  |         1.0803         |
|        convmixer_768_32         | 32  | 0.9991 |  0.9656   |  1.0024  |         1.0033         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.4392  |  10.8907  | 285.6346 |        284.8706        |
|            hrnet_w18            | 128 | 9.7538  |  35.9047  | 249.2625 |        243.8991        |
|          ghostnet_100           | 128 | 7.5727  |  14.8189  | 233.6336 |        239.8671        |
|            fbnetv3_b            | 128 |  8.195  |  16.5538  | 170.6264 |        170.2552        |
|           resnest101e           | 64  | 10.9242 |  24.0365  | 170.0193 |        169.3523        |
|           mobilevit_s           | 64  | 5.4747  |  11.0205  | 167.8332 |        170.821         |
|          pnasnet5large          | 16  | 8.2237  |  26.0759  | 161.8452 |        161.0501        |
|      mobilenetv3_large_100      | 128 | 4.2753  |  8.2953   | 159.5911 |        160.6049        |
|       gluon_inception_v3        | 128 | 5.8571  |  12.5502  | 158.6184 |        158.6029        |
|           tf_mixnet_l           | 128 |  8.797  |  16.5124  | 158.3847 |        159.2702        |
|          inception_v3           | 128 | 5.5809  |  12.5838  | 157.4124 |        156.491         |
|        adv_inception_v3         | 128 | 5.8803  |  12.5299  | 154.9877 |        161.8587        |
|            tinynet_a            | 128 |  5.788  |  11.9053  | 153.1664 |        158.0087        |
|        res2net101_26w_4s        | 64  | 10.6656 |  24.7771  | 151.8029 |        151.4163        |
|       tf_efficientnet_b0        | 128 | 4.9705  |  10.1328  | 151.0629 |        153.9675        |
|        twins_pcpvt_base         | 64  | 10.4435 |  23.5446  | 150.3631 |        145.4156        |
|            mixnet_l             | 128 | 8.2156  |  15.9274  | 149.0515 |        158.095         |
|          spnasnet_100           | 128 |  4.851  |  9.3576   | 139.397  |        131.1957        |
|           fbnetc_100            | 128 | 4.8917  |  9.3279   | 135.6389 |        136.664         |
|         mobilenetv2_100         | 128 | 3.9128  |  7.6889   | 134.0602 |        133.0476        |
|      xcit_large_24_p8_224       |  5  | 12.5083 |  29.5776  | 133.5166 |        131.956         |
|        res2net50_14w_8s         | 128 | 9.3125  |  22.0129  | 123.9931 |        122.8364        |
|           mnasnet_100           | 128 | 3.9115  |  7.4171   | 119.6424 |        121.8537        |
|          cait_m36_384           |  4  | 14.3119 |  31.348   | 118.4624 |        113.4804        |
|           regnety_002           | 128 | 4.8079  |  8.7255   | 109.4261 |        106.1625        |
|  swin_base_patch4_window7_224   | 64  |  8.091  |  18.8502  | 109.2691 |        109.181         |
|        sebotnet33ts_256         | 64  | 4.1851  |  9.3292   | 108.1835 |        109.0845        |
|         poolformer_m36          | 64  |  7.824  |  13.4397  | 102.497  |        100.3331        |
|          cspdarknet53           | 64  | 5.6346  |  10.6454  | 101.3059 |        100.8007        |
|             dpn107              | 32  | 9.4703  |  18.9349  | 99.3806  |        98.4224         |
|            lcnet_050            | 128 | 2.4269  |  4.9046   | 96.2382  |        98.9451         |
|       eca_botnext26ts_256       | 128 | 3.1206  |  6.6597   | 95.3241  |        95.3239         |
|        gluon_xception65         | 32  | 7.6928  |  16.8588  | 94.7821  |        93.8353         |
|             dla102              | 128 | 6.2724  |  13.8901  | 94.7513  |        95.4108         |
|           selecsls42b           | 128 | 2.3767  |  5.3675   | 93.0336  |        90.3992         |
|          botnet26t_256          | 128 | 2.8443  |  5.7846   | 92.6449  |        89.2483         |
|           res2next50            | 128 | 4.9573  |  11.8832  | 89.1744  |        88.7021         |
|         coat_lite_mini          | 128 | 3.3821  |   7.612   | 88.9769  |        88.7798         |
|         crossvit_9_240          | 128 | 5.6611  |  13.9161  | 88.3932  |        87.1823         |
|          jx_nest_base           | 32  | 6.5016  |  14.4429  | 84.4146  |         84.568         |
|            gernet_l             | 128 | 4.8101  |  8.8332   |  82.089  |        81.9392         |
|            nfnet_l0             | 128 | 5.1422  |  10.579   | 77.1781  |        75.6945         |
|        ese_vovnet19b_dw         | 128 | 2.5332  |  4.4474   | 76.4917  |        76.0621         |
|           volo_d1_224           | 64  | 4.9321  |  12.2635  | 74.6859  |        72.2966         |
|           dm_nfnet_f0           | 128 | 5.7719  |  11.0123  | 71.0112  |        74.1888         |
|        tnt_s_patch16_224        | 128 |  6.367  |  15.6753  | 70.1426  |        70.2248         |
|         visformer_small         | 128 | 2.5289  |  5.9037   | 67.5143  |        65.3293         |
|     swsl_resnext101_32x16d      | 32  | 5.9348  |  13.4907  | 62.3158  |         61.228         |
|            repvgg_a2            | 128 | 4.8412  |  8.6925   | 61.0959  |        61.6304         |
|          gmlp_s16_224           | 128 |  5.464  |   11.6    | 59.7861  |        60.0417         |
|          convnext_base          | 64  |  6.937  |  12.1943  | 59.0432  |        59.1255         |
|          gmixer_24_224          | 128 | 5.9257  |  13.4894  | 52.7464  |        49.1237         |
|           convit_base           | 64  | 3.3678  |  8.8789   | 48.7381  |        47.8487         |
|            pit_b_224            | 64  | 3.3622  |  7.7513   |  46.013  |        46.0401         |
| deit_base_distilled_patch16_224 | 64  | 3.0477  |  6.9189   | 42.2018  |        39.1785         |
|          resmlp_12_224          | 128 | 2.6767  |  5.2208   | 41.0897  |        40.9837         |
|      vit_base_patch16_224       | 64  | 2.9141  |  6.8314   | 39.6618  |        38.1064         |
|        convmixer_768_32         | 32  | 1.6667  |  6.7621   | 38.6454  |        37.0479         |
|      beit_base_patch16_224      | 64  | 4.0523  |  8.4376   | 34.0036  |        35.1932         |
|          mixer_b16_224          | 128 |  2.789  |  5.7423   | 33.2688  |        33.2774         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1848  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1117  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0079  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9899 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9634 |  0.9155   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9501  |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9362         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9782 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8225  |         0.9732         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7779   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.8306 | 310.9326  | 300.5669 |        299.703         |
|            hrnet_w18            | 128 | 279.3614 | 436.8303  | 204.7845 |        215.173         |
|          pnasnet5large          | 16  | 196.5964 | 212.4064  | 177.2335 |        173.9514        |
|           tf_mixnet_l           | 128 | 193.5943 | 227.8338  | 158.6983 |        157.8532        |
|            mixnet_l             | 128 | 184.6187 |  220.07   | 153.2368 |        152.509         |
|          cait_m36_384           |  4  | 170.1928 | 167.3719  | 122.709  |        125.4096        |
|           resnest101e           | 64  | 164.2084 | 187.1754  | 114.0137 |        119.3611        |
|             dla102              | 128 | 171.7738 | 210.1763  | 111.8707 |        112.0123        |
|     swsl_resnext101_32x16d      | 32  | 118.2995 | 140.4219  | 111.5942 |        115.6372        |
|         poolformer_m36          | 64  | 145.1181 | 145.3647  | 107.129  |        107.9271        |
|        tnt_s_patch16_224        | 128 | 323.3035 | 323.0195  | 106.9988 |        108.0856        |
|          inception_v3           | 128 | 160.3406 | 184.9305  | 104.2292 |        104.7846        |
|       gluon_inception_v3        | 128 | 160.6641 | 184.7914  | 104.1064 |        104.6833        |
|        adv_inception_v3         | 128 | 160.5678 | 185.8142  | 104.0712 |        104.9958        |
|        res2net50_14w_8s         | 128 | 140.9872 | 177.9206  | 101.6609 |        103.3714        |
|           convit_base           | 64  | 163.0241 |  162.967  | 100.799  |        100.797         |
|             dpn107              | 32  | 112.9194 | 130.6452  | 96.3758  |         92.355         |
|           res2next50            | 128 | 125.8543 | 152.2254  | 91.7923  |        91.9536         |
|        gluon_xception65         | 32  | 99.0894  | 116.7674  | 91.2384  |        90.7029         |
|  swin_base_patch4_window7_224   | 64  | 146.3954 |  151.292  |  89.033  |        89.4554         |
|        res2net101_26w_4s        | 64  | 98.8772  | 125.0467  | 85.4362  |        92.8236         |
|          mixer_b16_224          | 128 | 116.5619 | 114.0524  | 85.3485  |        85.6622         |
|           dm_nfnet_f0           | 128 | 127.0441 | 127.0277  | 84.2094  |        86.9652         |
|            fbnetv3_b            | 128 | 114.7839 | 141.7269  | 83.3781  |        81.3692         |
|            pit_b_224            | 64  | 118.1605 | 118.4395  | 81.8025  |        82.0662         |
|          convnext_base          | 64  | 122.7305 |  122.193  | 80.1789  |        81.3615         |
|         visformer_small         | 128 | 90.9691  |  96.0708  |  77.204  |        77.7585         |
|      beit_base_patch16_224      | 64  | 101.3451 | 104.2548  | 74.6388  |        74.4761         |
|            nfnet_l0             | 128 | 111.6014 | 135.4632  | 74.1451  |        76.7017         |
|          gmlp_s16_224           | 128 | 136.8715 | 125.4817  | 73.2757  |        73.8568         |
|       eca_botnext26ts_256       | 128 | 108.1744 |  146.596  | 72.9229  |        73.5824         |
|          jx_nest_base           | 32  | 100.217  | 100.3716  | 72.0723  |        72.3697         |
|          cspdarknet53           | 64  | 93.7879  | 111.6277  | 71.3222  |        69.1019         |
|           volo_d1_224           | 64  | 120.234  | 123.2958  | 70.6114  |        71.5228         |
|          botnet26t_256          | 128 | 101.4808 | 116.0524  | 70.0635  |         69.27          |
|      vit_base_patch16_224       | 64  | 86.7127  |  86.7423  | 69.8609  |        69.7573         |
|            gernet_l             | 128 | 77.1036  |  90.9913  | 69.5469  |        67.2667         |
| deit_base_distilled_patch16_224 | 64  | 84.6071  |  84.7927  | 67.1284  |         67.066         |
|            repvgg_a2            | 128 | 77.1839  |  95.4259  | 66.2581  |        64.4036         |
|          gmixer_24_224          | 128 | 117.9369 | 131.7143  | 66.1799  |        66.5466         |
|      xcit_large_24_p8_224       |  5  | 122.3184 | 161.1876  | 60.8107  |        78.2979         |
|       tf_efficientnet_b0        | 128 | 84.3879  | 119.1848  | 60.0015  |        58.4013         |
|        twins_pcpvt_base         | 64  | 114.1257 | 127.8921  | 59.0717  |         74.334         |
|           fbnetc_100            | 128 | 82.6382  | 106.3233  |  58.299  |        55.8247         |
|           rexnet_100            | 128 | 79.2127  | 107.6467  | 57.9405  |        56.3777         |
|         coat_lite_mini          | 128 | 113.0176 | 112.7474  | 57.7123  |        58.4377         |
|            tinynet_a            | 128 | 73.3426  | 102.3694  | 56.3199  |        54.7903         |
|           mobilevit_s           | 64  | 84.0288  | 110.4701  | 56.2007  |        55.6696         |
|        sebotnet33ts_256         | 64  | 79.9339  | 100.3943  | 50.5399  |        49.4778         |
|         crossvit_9_240          | 128 | 81.7544  | 103.5451  | 49.1918  |        49.6646         |
|          spnasnet_100           | 128 | 70.2403  |  89.5242  | 48.8711  |        46.6489         |
|          ghostnet_100           | 128 | 89.9769  | 116.8949  | 48.0793  |        56.2708         |
|        ese_vovnet19b_dw         | 128 | 64.0546  |  73.847   | 45.3231  |        44.5374         |
|         mobilenetv2_100         | 128 | 65.3881  |  84.209   | 44.6274  |         42.943         |
|           selecsls42b           | 128 | 60.0717  |  73.7578  | 42.4133  |         42.317         |
|           mnasnet_100           | 128 | 64.0893  |  82.1061  | 42.2229  |        40.7182         |
|          resmlp_12_224          | 128 | 53.1588  |  59.2499  | 41.7187  |        41.9019         |
|      mobilenetv3_large_100      | 128 | 61.3026  |  76.2445  | 40.4934  |        39.9628         |
|           regnety_002           | 128 | 40.8267  |  51.1507  | 25.7065  |        29.6025         |
|            lcnet_050            | 128 | 31.5585  |  40.3856  |  17.533  |        20.1221         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/huggingface_amp.png :

bench_logs/torchbench_amp.png :

Build Summary

see more

Run name

day_101_11_04_23_performance_amp_766

Commit hashes

pytorch commit: 9c5473b
pytorch commit date: 2023-04-12 02:01:10+00:00
torchbench commit: 491e53cd3cabaf02e351d083493621e32b6be1cf
torchbench commit date: 2023-04-11 18:46:01-07:00

TorchDynamo config flags

torch._dynamo.config.DO_NOT_USE_legacy_non_fake_example_inputs = False
torch._dynamo.config.allow_ignore_mark_dynamic = False
torch._dynamo.config.allow_rnn = False
torch._dynamo.config.assume_static_by_default = False
torch._dynamo.config.capture_dynamic_output_shape_ops = False
torch._dynamo.config.capture_scalar_outputs = False
torch._dynamo.config.dead_code_elimination = True
torch._dynamo.config.disable = False
torch._dynamo.config.dynamic_shapes = False
torch._dynamo.config.enforce_cond_guards_match = True
torch._dynamo.config.error_on_nested_fx_trace = True
torch._dynamo.config.error_on_recompile = False
torch._dynamo.config.guard_nn_modules = False
torch._dynamo.config.optimize_ddp = True
torch._dynamo.config.print_graph_breaks = False
torch._dynamo.config.print_guards = False
torch._dynamo.config.print_specializations = False
torch._dynamo.config.profile_cache_lookup = False
torch._dynamo.config.raise_on_backend_change = False
torch._dynamo.config.raise_on_ctx_manager_usage = True
torch._dynamo.config.raise_on_unsafe_aot_autograd = False
torch._dynamo.config.replay_record_enabled = False
torch._dynamo.config.repro_forward_only = False
torch._dynamo.config.rewrite_assert_with_torch_assert = True
torch._dynamo.config.skip_fsdp_guards = True
torch._dynamo.config.skip_nnmodule_hook_guards = True
torch._dynamo.config.specialize_int = False
torch._dynamo.config.suppress_errors = False
torch._dynamo.config.verbose = False
torch._dynamo.config.verify_correctness = False

Torch version

torch: 2.1.0a0+git9c5473b

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.62x    |    1.65x    |    1.42x    |
| inductor_no_cudagraphs |   1.30x    |    1.54x    |    1.41x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.76    |    7.24     |    5.89     |
|       aot_eager        |    9.17    |    15.93    |    13.06    |
|        inductor        |   65.32    |    64.83    |   113.53    |
| inductor_no_cudagraphs |   65.32    |    61.88    |   112.45    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.78x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Previous report name: /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 85%, 51/60  | 85%, 51/60  |
|        inductor        | huggingface | 91%, 41/45  | 91%, 41/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.61x    |   1.62x   |
|        inductor        | huggingface |   1.65x    |   1.65x   |
|        inductor        | timm_models |   1.42x    |   1.42x   |
| inductor_no_cudagraphs | torchbench  |   1.30x    |   1.30x   |
| inductor_no_cudagraphs | huggingface |   1.54x    |   1.54x   |
| inductor_no_cudagraphs | timm_models |   1.40x    |   1.41x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |         hf_Longformer         |   fail_to_run   |      fail_to_run       |
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |              gat              |     0.0000      |         0.0000         |
| torchbench  |              gcn              |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |             sage              |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |          pass          |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |             dcgan             |  1.4235  |          0.82          |
| torchbench  |         lennard_jones         |  1.4033  |         0.8958         |
| torchbench  |       soft_actor_critic       |  1.0232  |         0.8309         |
| torchbench  |          tts_angular          |  0.9544  |         0.9288         |
| torchbench  |    nvidia_deeprecommender     |  0.8726  |         1.0191         |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0851         |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  1.081   |         0.9249         |
| huggingface |     DebertaV2ForMaskedLM      |  0.9899  |         0.7354         |
| huggingface | DebertaV2ForQuestionAnswering |  0.921   |         0.736          |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.4407         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 175.8561 |        178.1039        |
| torchbench  |        phlippe_densenet        | 170.8977 |        165.102         |
| torchbench  |           hf_BigBird           | 153.2659 |        133.3807        |
| torchbench  |       timm_efficientnet        | 148.7643 |        147.536         |
| torchbench  |          densenet121           | 142.2393 |        139.6832        |
| torchbench  |       mobilenet_v3_large       | 132.4366 |        140.5454        |
| torchbench  |          mobilenet_v2          | 128.1771 |        126.0201        |
| torchbench  |             yolov3             | 124.4639 |        122.6392        |
| torchbench  | timm_vision_transformer_large  |   nan    |        127.3264        |
| huggingface |     MobileBertForMaskedLM      | 147.9875 |        148.4814        |
| huggingface |      DebertaV2ForMaskedLM      | 146.6582 |        76.6963         |
| huggingface | DebertaV2ForQuestionAnswering  | 146.3319 |        76.4348         |
| huggingface | MobileBertForQuestionAnswering | 145.4655 |        142.3376        |
| huggingface | M2M100ForConditionalGeneration | 137.1319 |        138.3301        |
| huggingface |  MT5ForConditionalGeneration   | 136.5006 |        136.0151        |
| huggingface |        XGLMForCausalLM         | 134.9026 |        134.6974        |
| timm_models |           rexnet_100           | 285.3998 |        300.8896        |
| timm_models |           hrnet_w18            | 255.186  |        247.495         |
| timm_models |          ghostnet_100          | 236.8368 |        236.9131        |
| timm_models |          mobilevit_s           | 178.4743 |        177.6305        |
| timm_models |           fbnetv3_b            | 175.0752 |        173.6566        |
| timm_models |          resnest101e           | 171.6154 |        169.8922        |
| timm_models |          tf_mixnet_l           | 167.7249 |        163.8811        |
| timm_models |         pnasnet5large          | 166.9575 |        164.1125        |
| timm_models |          inception_v3          | 166.2498 |        162.6374        |
| timm_models |     mobilenetv3_large_100      | 165.8546 |        165.8863        |
| timm_models |           tinynet_a            | 165.3405 |        165.6042        |
| timm_models |            mixnet_l            | 163.2159 |        157.299         |
| timm_models |        adv_inception_v3        | 161.2748 |        162.2546        |
| timm_models |       tf_efficientnet_b0       | 160.7946 |        152.2355        |
| timm_models |       gluon_inception_v3       | 159.9696 |        160.4201        |
| timm_models |       res2net101_26w_4s        | 156.6893 |        154.9523        |
| timm_models |        twins_pcpvt_base        | 153.3044 |        151.4508        |
| timm_models |          spnasnet_100          | 140.9908 |        138.2189        |
| timm_models |           fbnetc_100           | 140.5204 |        140.6783        |
| timm_models |      xcit_large_24_p8_224      | 139.2014 |        135.8129        |
| timm_models |        mobilenetv2_100         | 131.637  |        132.3704        |
| timm_models |        res2net50_14w_8s        | 127.4727 |        127.9485        |
| timm_models |          mnasnet_100           | 126.8427 |        124.5631        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |                 yolov3                  |  0.892   |         1.0116         |
| torchbench  |              hf_GPT2_large              |  0.8904  |         1.128          |
| torchbench  |           speech_transformer            |  0.8651  |         0.869          |
| torchbench  |              timm_resnest               |  0.8628  |         0.966          |
| torchbench  |           shufflenet_v2_x1_0            |  0.8616  |         0.9644         |
| torchbench  |               Super_SloMo               |  0.8614  |         1.208          |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.8835         |
| torchbench  |               timm_regnet               |  0.8498  |         0.9496         |
| torchbench  |                resnet152                |  0.8489  |         0.9405         |
| torchbench  |           Background_Matting            |  0.8485  |         1.0403         |
| torchbench  |              hf_DistilBert              |  0.8476  |         0.9945         |
| torchbench  |                 hf_Bert                 |  0.8411  |         1.0258         |
| torchbench  |              hf_Bert_large              |  0.8302  |         1.0725         |
| torchbench  |               hf_T5_large               |  0.8201  |         1.168          |
| torchbench  |              pytorch_unet               |  0.8134  |         0.9308         |
| torchbench  |            phlippe_densenet             |  0.8058  |         0.8659         |
| torchbench  |                 hf_Bart                 |  0.7933  |         0.9173         |
| torchbench  |           mobilenet_v3_large            |  0.785   |         0.7757         |
| torchbench  |                  dcgan                  |  0.7821  |         0.9645         |
| torchbench  |                resnet50                 |  0.7808  |         0.8824         |
| torchbench  |                 demucs                  |  0.773   |         0.9656         |
| torchbench  |              squeezenet1_1              |  0.7722  |         0.908          |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.8893         |
| torchbench  |               timm_vovnet               |  0.7529  |         0.8869         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9808         |
| torchbench  |               mnasnet1_0                |  0.7144  |         0.8072         |
| torchbench  |                 alexnet                 |  0.7091  |         0.939          |
| torchbench  |               densenet121               |  0.7071  |         0.7944         |
| torchbench  |             pytorch_struct              |  0.697   |         0.7362         |
| torchbench  |               hf_BigBird                |  0.6947  |         1.1191         |
| torchbench  |             resnext50_32x4d             |  0.6685  |         0.7679         |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.8931         |
| torchbench  |                   drq                   |  0.6379  |         0.9573         |
| torchbench  |            soft_actor_critic            |  0.6066  |         0.9973         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.6065  |         0.6172         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.7463         |
| torchbench  |                resnet18                 |  0.5395  |         0.6097         |
| torchbench  |              lennard_jones              |  0.5317  |         0.9997         |
| torchbench  |               hf_Reformer               |  0.4538  |         0.8022         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3169  |         0.3395         |
| huggingface |           ElectraForCausalLM            |  0.8941  |         0.9739         |
| huggingface |           PegasusForCausalLM            |  0.893   |         0.9864         |
| huggingface |          DistilBertForMaskedLM          |  0.8849  |         0.9624         |
| huggingface |            TrOCRForCausalLM             |  0.8836  |         0.9583         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8729  |         0.9803         |
| huggingface |     PegasusForConditionalGeneration     |  0.8689  |         1.0689         |
| huggingface |      MBartForConditionalGeneration      |  0.8672  |         1.0307         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         1.0139         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         1.0962         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8184  |         0.9119         |
| huggingface |         Speech2Text2ForCausalLM         |  0.789   |         0.8779         |
| huggingface |     M2M100ForConditionalGeneration      |  0.7651  |         0.9908         |
| huggingface |          MobileBertForMaskedLM          |  0.7473  |         1.016          |
| huggingface |             XGLMForCausalLM             |  0.7117  |         0.9792         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6569  |         0.8392         |
| huggingface |           DebertaForMaskedLM            |  0.5646  |         0.9988         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5187  |         0.9664         |
| huggingface |       DebertaForQuestionAnswering       |  0.4867  |         1.1525         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.4855  |          0.98          |
| timm_models |                hrnet_w18                |  0.8918  |          0.99          |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1115         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0171         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0171         |
| timm_models |              inception_v3               |  0.8904  |         1.0171         |
| timm_models |                 dpn107                  |  0.8833  |         0.9642         |
| timm_models |            gluon_xception65             |  0.8831  |         0.9705         |
| timm_models |              ghostnet_100               |  0.8807  |         0.977          |
| timm_models |              spnasnet_100               |  0.8786  |         0.9451         |
| timm_models |          mobilenetv3_large_100          |  0.877   |         0.9362         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1871         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0072         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9607         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9483         |
| timm_models |                mixnet_l                 |  0.8687  |         0.9902         |
| timm_models |               mnasnet_100               |  0.8683  |         0.9403         |
| timm_models |               res2next50                |  0.866   |         0.9547         |
| timm_models |              cait_m36_384               |  0.8632  |         0.989          |
| timm_models |               fbnetc_100                |  0.8596  |         0.9535         |
| timm_models |                pit_b_224                |  0.8578  |         1.0242         |
| timm_models |               selecsls42b               |  0.8576  |         0.9664         |
| timm_models |              convnext_base              |  0.8505  |         1.0338         |
| timm_models |                gernet_l                 |  0.8499  |         0.9706         |
| timm_models |         swsl_resnext101_32x16d          |  0.8461  |         0.9786         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0202         |
| timm_models |              botnet26t_256              |  0.8239  |         0.9779         |
| timm_models |          xcit_large_24_p8_224           |  0.8225  |         0.9732         |
| timm_models |                lcnet_050                |  0.805   |         0.884          |
| timm_models |                repvgg_a2                |  0.7738  |         0.9611         |
| timm_models |               regnety_002               |  0.7602  |         0.8966         |
| timm_models |             crossvit_9_240              |  0.7526  |         0.9898         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9045         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9604         |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

bench_logs/comp_time_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/geomean_over_time.png :

bench_logs/passrate_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Performance speedup regressions

+------------------------+-------------+-------------+------------+
|        compiler        |    name     | prev_status | cur_status |
+------------------------+-------------+-------------+------------+
| inductor_no_cudagraphs | tts_angular |   0.9525    |   0.9288   |
+------------------------+-------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+--------+-------------+------------+
|        compiler        |  name  | prev_status | cur_status |
+------------------------+--------+-------------+------------+
|        inductor        | yolov3 |  119.3765   |  124.4639  |
| inductor_no_cudagraphs | yolov3 |  119.0635   |  122.6392  |
+------------------------+--------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766

Compilation latency (sec) regressions

+----------+-------------+-------------+------------+
| compiler |    name     | prev_status | cur_status |
+----------+-------------+-------------+------------+
| inductor | mnasnet_100 |  119.6424   |  126.8427  |
+----------+-------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9685 |  0.9132   |  3.7928  |         1.397          |
|           BERT_pytorch            |  16  | 1.0063 |  0.8107   |  3.3862  |         2.2486         |
|            hf_BigBird             |  2   | 0.9556 |   0.78    |  3.027   |         1.6893         |
|            densenet121            |  4   | 0.9936 |  0.7269   |  2.8924  |         1.0942         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9757 |  0.8904   |  2.4572  |         1.8046         |
|             hf_Albert             |  8   | 0.9963 |  0.9597   |  2.4009  |         2.3371         |
|            hf_T5_large            |  2   | 1.0083 |  0.8271   |  2.3696  |         2.0191         |
|               dlrm                | 1024 | 0.9407 |   0.846   |  2.234   |         1.1888         |
|         phlippe_densenet          | 128  | 0.9952 |  0.7801   |  2.1045  |         1.0307         |
|          pytorch_struct           | 200  | 0.9208 |  0.7709   |  2.1041  |         1.1584         |
|        mobilenet_v3_large         |  32  | 0.9989 |  0.7878   |  2.0822  |         1.2418         |
|              hf_GPT2              |  4   |  1.02  |  0.9802   |  2.0444  |         1.9117         |
|           squeezenet1_1           |  32  | 0.9841 |  0.9457   |  1.9694  |         1.3378         |
|               hf_T5               |  8   | 0.997  |  0.8567   |  1.9599  |         2.0279         |
|              hf_Bert              |  4   | 1.0279 |   0.852   |  1.9019  |         1.7112         |
|          phlippe_resnet           | 128  | 0.9871 |  0.7668   |  1.8456  |         0.9909         |
|          resnext50_32x4d          |  8   |  0.99  |  0.7196   |  1.7455  |         1.0039         |
|           hf_GPT2_large           |  4   |  1.0   |   0.989   |  1.7265  |         1.7927         |
|            mnasnet1_0             |  32  | 0.9938 |  0.7419   |  1.7157  |         1.052          |
|              hf_Bart              |  4   | 1.0012 |  0.8353   |  1.7148  |         1.6163         |
|           hf_Bert_large           |  4   | 1.0295 |  0.8757   |  1.6499  |         1.6693         |
|        speech_transformer         |  32  | 0.9873 |  0.8257   |  1.649   |         1.6407         |
|        shufflenet_v2_x1_0         | 128  | 0.9963 |  0.7667   |  1.6473  |         1.2294         |
|             resnet18              |  16  | 0.988  |  0.7664   |  1.6216  |         1.0072         |
|                drq                |  1   | 0.9496 |  0.7517   |  1.581   |         1.0506         |
|           timm_resnest            |  32  | 0.9971 |  0.8575   |  1.5788  |         1.5505         |
|      timm_vision_transformer      |  32  | 0.992  |  0.8939   |  1.5744  |         1.4259         |
|            timm_nfnet             | 128  | 0.9997 |   0.998   |  1.5713  |         1.5038         |
|           fastNLP_Bert            |  6   | 1.0063 |  0.8538   |  1.5558  |         1.5522         |
|           hf_DistilBert           |  8   | 0.9914 |  0.9664   |  1.542   |         1.5046         |
| attention_is_all_you_need_pytorch | 256  | 1.0028 |   0.915   |  1.5354  |         1.6143         |
|           mobilenet_v2            |  96  | 0.9991 |  0.7791   |  1.5302  |         1.5162         |
|         timm_efficientnet         |  32  | 0.9449 |  0.6322   |  1.4476  |         1.1172         |
|               dcgan               |  32  | 0.8523 |  0.6824   |  1.4235  |          0.82          |
|           lennard_jones           | 1000 | 0.8559 |  0.7633   |  1.4033  |         0.8958         |
|           pytorch_unet            |  1   | 0.9987 |   0.205   |  1.3612  |         1.3582         |
|          LearningToPaint          |  96  | 0.9922 |   0.779   |  1.3278  |         1.087          |
|          pytorch_stargan          |  16  | 0.9934 |  0.7849   |  1.2766  |         1.2463         |
|            Super_SloMo            |  6   | 0.9988 |  0.1779   |  1.2517  |         1.235          |
|               vgg16               |  64  | 0.9995 |  0.9983   |   1.24   |         1.2537         |
|             resnet152             |  32  | 1.0003 |  0.7646   |  1.2219  |         1.0482         |
|        Background_Matting         |  4   | 0.9994 |  0.1367   |  1.2144  |         1.2096         |
|             resnet50              |  32  | 0.9974 |  0.7793   |  1.2049  |         1.0864         |
|              yolov3               |  16  | 0.9988 |  0.8086   |  1.2033  |         1.204          |
|            hf_Reformer            |  4   | 0.9857 |  0.9643   |  1.1431  |         1.0697         |
|              alexnet              | 128  | 0.9991 |  0.9967   |  1.0892  |         1.1351         |
|              demucs               |  4   |  1.0   |   1.002   |  1.0388  |         1.0392         |
|         soft_actor_critic         | 256  | 0.8527 |  0.6379   |  1.0232  |         0.8309         |
|            timm_regnet            |  32  | 0.9315 |  0.7846   |  1.0085  |         0.9804         |
|            timm_vovnet            |  32  | 0.8747 |  0.7292   |  0.967   |         0.9668         |
|            tts_angular            |  64  | 0.9195 |  0.8785   |  0.9544  |         0.9288         |
|      nvidia_deeprecommender       | 256  | 0.9988 |  0.9981   |  0.8726  |         1.0191         |
|   timm_vision_transformer_large   |  32  |  1.0   |    0.0    |   0.0    |         1.0851         |
|           hf_Longformer           |  2   | 1.0185 |  0.6927   |   0.0    |          0.0           |
|               moco                |  32  | 0.9786 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|          vision_maskrcnn          |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 26.6046 |  53.9299  | 175.8561 |        178.1039        |
|         phlippe_densenet          | 128  | 3.2434  |  7.0823   | 170.8977 |        165.102         |
|            hf_BigBird             |  2   | 12.5885 |  36.5513  | 153.2659 |        133.3807        |
|         timm_efficientnet         |  32  | 4.8633  |  10.2245  | 148.7643 |        147.536         |
|            densenet121            |  4   | 7.7785  |  18.0065  | 142.2393 |        139.6832        |
|        mobilenet_v3_large         |  32  | 3.4338  |  7.6319   | 132.4366 |        140.5454        |
|           mobilenet_v2            |  96  | 3.1159  |   7.03    | 128.1771 |        126.0201        |
|              yolov3               |  16  | 4.9191  |  10.6205  | 124.4639 |        122.6392        |
|            mnasnet1_0             |  32  | 3.1232  |  6.7754   | 113.3584 |        104.6014        |
|             resnet152             |  32  | 9.1049  |  20.1783  | 109.2955 |        109.6667        |
|           hf_GPT2_large           |  4   | 14.2059 |  29.7198  | 107.9634 |        108.6364        |
|           timm_resnest            |  32  | 1.8326  |   3.913   | 101.9241 |        103.8199        |
|        shufflenet_v2_x1_0         | 128  | 3.4632  |  7.7182   | 85.5184  |        83.4134         |
|        speech_transformer         |  32  | 5.8871  |  13.6726  | 81.0578  |        80.9299         |
| attention_is_all_you_need_pytorch | 256  | 4.3826  |  10.8481  | 76.1866  |        75.3867         |
|            timm_regnet            |  32  | 6.5525  |  12.1266  |  75.698  |        75.4722         |
|            timm_nfnet             | 128  | 5.6839  |  10.9735  | 75.3238  |        73.0714         |
|        Background_Matting         |  4   |  3.164  |  11.4664  | 74.1665  |        72.6578         |
|           BERT_pytorch            |  16  | 5.0005  |  11.3057  | 73.1046  |        70.6616         |
|             resnet50              |  32  | 3.1927  |  7.0399   | 67.8535  |        66.5579         |
|           hf_Bert_large           |  4   | 10.0981 |  21.1089  | 66.4356  |        66.7945         |
|            timm_vovnet            |  32  | 3.5241  |  6.4786   | 65.8807  |        65.9278         |
|           pytorch_unet            |  1   | 1.5395  |  4.3907   | 58.9627  |        61.4082         |
|       functorch_dp_cifar10        |  64  | 1.1927  |  2.3838   | 58.0709  |        59.1854         |
|          resnext50_32x4d          |  8   | 3.2027  |  7.0227   |  56.32   |         54.227         |
|      timm_vision_transformer      |  32  | 3.3109  |  7.1475   | 54.4784  |        53.7814         |
|              hf_Bart              |  4   | 6.1615  |  13.4442  | 52.8054  |        51.7488         |
|           fastNLP_Bert            |  6   | 5.1506  |  11.1845  | 51.8213  |        51.8388         |
|               hf_T5               |  8   | 5.5611  |  12.3986  | 51.6288  |        53.7995         |
|            hf_Reformer            |  4   |  4.011  |  5.9694   | 51.3766  |        44.2419         |
|          pytorch_stargan          |  16  | 1.2252  |  3.2293   | 48.4753  |        47.3122         |
|          LearningToPaint          |  96  | 1.4029  |  2.8662   | 48.1159  |        46.3408         |
|            Super_SloMo            |  6   | 2.8629  |  9.7938   | 47.2781  |        44.4407         |
|             resnet18              |  16  | 1.3437  |  2.7624   | 46.5863  |        47.2707         |
|             hf_Albert             |  8   | 2.4526  |  7.9515   | 44.2452  |        42.0912         |
|              hf_GPT2              |  4   | 4.4741  |  9.4403   | 43.7969  |        43.5493         |
|              hf_Bert              |  4   | 5.1455  |  10.763   | 41.5684  |        41.8415         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2059  |  2.9565   | 40.5312  |        39.7179         |
|          phlippe_resnet           | 128  | 1.3248  |   2.855   | 35.2796  |        35.3815         |
|              demucs               |  4   | 1.4082  |  2.1833   | 33.4617  |        33.5868         |
|           hf_DistilBert           |  8   |  2.374  |  5.1941   | 32.4076  |        32.0159         |
|           squeezenet1_1           |  32  | 1.0256  |  1.7548   | 26.9203  |        26.2453         |
|          pytorch_struct           | 200  | 0.7335  |  1.3315   | 24.4981  |        22.0006         |
|               vgg16               |  64  | 0.6131  |  1.1168   | 18.6126  |        18.0629         |
|              alexnet              | 128  | 0.4821  |  0.7697   | 17.9161  |        15.3989         |
|                drq                |  1   | 0.6413  |  1.0216   | 13.4392  |        11.3415         |
|      nvidia_deeprecommender       | 256  | 0.4691  |  0.7483   | 12.2661  |        11.0851         |
|               dcgan               |  32  | 0.4195  |  0.6991   | 10.1589  |         9.765          |
|         soft_actor_critic         | 256  | 0.4087  |  0.5871   |  9.8687  |         9.3241         |
|               dlrm                | 1024 | 0.3618  |  0.7657   |  9.0059  |         9.2828         |
|            tts_angular            |  64  | 0.4343  |  0.5086   |  7.8926  |         7.7701         |
|           lennard_jones           | 1000 | 0.3931  |  0.5903   |  7.5804  |         7.2753         |
|   timm_vision_transformer_large   |  32  | 9.2319  |    nan    |   nan    |        127.3264        |
|           hf_Longformer           |  2   | 9.3995  |  30.0565  |   nan    |          nan           |
|               moco                |  32  | 34.8485 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0378  |         1.2557         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.986  |   0.765   |  1.0104  |         1.103          |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9895  |         0.9983         |
|            timm_nfnet             | 128  | 0.9068 |   0.875   |  0.9692  |         1.0726         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  0.9575  |         1.1593         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9321  |         1.0713         |
|         timm_efficientnet         |  32  | 0.9866 |  0.7659   |  0.9282  |         1.0064         |
|              yolov3               |  16  | 0.988  |  0.8253   |  0.892   |         1.0116         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8904  |         1.128          |
|        speech_transformer         |  32  | 0.9915 |   0.901   |  0.8651  |         0.869          |
|           timm_resnest            |  32  | 0.9878 |  0.8801   |  0.8628  |         0.966          |
|        shufflenet_v2_x1_0         | 128  | 0.9539 |  0.8375   |  0.8616  |         0.9644         |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  0.8614  |         1.208          |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9908 |  0.8524   |  0.8498  |         0.9496         |
|             resnet152             |  32  | 0.9958 |  0.8947   |  0.8489  |         0.9405         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  0.8485  |         1.0403         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9945         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.8411  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.8302  |         1.0725         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|              hf_Bart              |  4   | 0.9087 |  0.7521   |  0.7933  |         0.9173         |
|        mobilenet_v3_large         |  32  | 0.9801 |  0.9458   |  0.785   |         0.7757         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9919 |  0.8625   |  0.7808  |         0.8824         |
|              demucs               |  4   | 0.9661 |  0.9659   |  0.773   |         0.9656         |
|           squeezenet1_1           |  32  | 0.9695 |  0.9312   |  0.7722  |         0.908          |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|            mnasnet1_0             |  32  | 0.9792 |  0.8651   |  0.7144  |         0.8072         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.7091  |         0.939          |
|            densenet121            |  4   | 0.9944 |  0.9779   |  0.7071  |         0.7944         |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.697   |         0.7362         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6947  |         1.1191         |
|          resnext50_32x4d          |  8   | 0.9965 |  0.8441   |  0.6685  |         0.7679         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8796   |  0.6065  |         0.6172         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.5925  |         0.7463         |
|             resnet18              |  16  | 0.9753 |  0.8018   |  0.5395  |         0.6097         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  | 0.9997 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 209.1233 | 210.9245  | 120.8114 |        116.6913        |
|        Background_Matting         |  4   | 125.8016 | 919.4134  | 103.4004 |        103.8873        |
|            hf_T5_large            |  2   | 230.4164 |  266.533  | 93.8898  |        113.3883        |
|               hf_T5               |  8   | 182.1196 | 209.0436  | 91.4441  |        89.4835         |
|            timm_nfnet             | 128  | 118.1119 | 118.1459  | 75.2956  |        78.9228         |
|            hf_BigBird             |  2   | 202.4053 | 252.5066  | 74.2458  |        116.1969        |
|            hf_Reformer            |  4   | 82.1874  |  84.0494  | 70.7851  |        75.5291         |
|            Super_SloMo            |  6   | 79.4406  | 447.1167  | 63.3238  |        64.3457         |
|              yolov3               |  16  | 68.4438  |  84.464   |  56.916  |        56.9644         |
|            timm_regnet            |  32  | 59.5791  |  70.4891  |  55.129  |        56.9614         |
|               vgg16               |  64  | 66.2203  |  66.2673  | 53.3906  |        52.7925         |
|             resnet152             |  32  | 63.5924  |  83.4125  | 52.4753  |        63.9666         |
|              demucs               |  4   | 53.6548  |  53.394   | 51.8234  |        51.5948         |
|           hf_Bert_large           |  4   | 80.2268  |  93.8252  |  50.773  |        54.9316         |
| attention_is_all_you_need_pytorch | 256  | 54.6728  |  59.695   | 35.3926  |        37.2729         |
|        speech_transformer         |  32  | 64.4543  |  75.9237  | 35.0429  |        35.6176         |
|              hf_Bart              |  4   | 59.2441  |  69.9518  | 34.0894  |        35.4142         |
|           fastNLP_Bert            |  6   | 52.1231  |  61.6632  | 33.5162  |        33.8524         |
|           mobilenet_v2            |  96  | 46.9108  |  60.2586  | 30.6442  |        30.9546         |
|           pytorch_unet            |  1   |  39.842  | 194.1126  | 29.2042  |        29.2676         |
|             hf_Albert             |  8   | 68.4421  |  72.1545  | 29.0782  |        29.6951         |
|              hf_GPT2              |  4   |  48.102  |   49.92   | 26.0371  |        25.6269         |
|            timm_vovnet            |  32  | 28.1633  |  34.105   | 25.4273  |         25.594         |
|         timm_efficientnet         |  32  | 33.7431  |  51.3488  | 21.9336  |        28.9692         |
|             resnet50              |  32  | 26.4332  |  34.282   | 21.8848  |        24.5349         |
|              hf_Bert              |  4   | 40.0062  |  48.0185  | 21.7369  |        24.5734         |
|           hf_DistilBert           |  8   | 32.2282  |  32.4557  | 21.1243  |         20.865         |
|            densenet121            |  4   | 54.3376  |  74.1464  | 18.6526  |        49.6056         |
|        shufflenet_v2_x1_0         | 128  | 30.6266  |  41.7773  | 18.6024  |        25.0793         |
|      timm_vision_transformer      |  32  |  29.176  |  34.7072  | 17.9694  |        19.7864         |
|           BERT_pytorch            |  16  | 57.2051  |  65.847   |  17.198  |        24.6712         |
|           timm_resnest            |  32  | 24.0962  |  28.0527  | 15.1598  |        15.5014         |
|        mobilenet_v3_large         |  32  | 26.7466  |  34.2944  |  13.795  |        21.3624         |
|            mnasnet1_0             |  32  |  22.234  |  30.0642  |  13.128  |        22.2602         |
|      nvidia_deeprecommender       | 256  | 10.2231  |  10.2381  | 11.7059  |         10.026         |
|          resnext50_32x4d          |  8   | 20.4529  |  28.4064  | 11.6036  |        20.0577         |
|          pytorch_stargan          |  16  |  14.863  |  19.4014  | 11.5644  |        12.2386         |
|         phlippe_densenet          | 128  | 23.4034  |  30.0814  |  11.032  |        22.8883         |
|              alexnet              | 128  |  9.828   |  9.8337   |  9.0193  |         8.6355         |
|          LearningToPaint          |  96  | 11.3746  |  14.4735  |  8.534   |         10.336         |
|            tts_angular            |  64  |  6.7435  |  7.0912   |  6.5332  |         6.7069         |
|             resnet18              |  16  |  9.2362  |  12.1366  |  5.7817  |         9.1189         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 17.1613  |  15.8731  |  5.6782  |         7.7789         |
|           squeezenet1_1           |  32  | 10.4888  |  11.4444  |  5.4049  |         7.7031         |
|          phlippe_resnet           | 128  |  8.9982  |  11.8273  |  4.9068  |        10.1551         |
|       functorch_dp_cifar10        |  64  | 10.5216  |  11.1603  |  2.7921  |         7.1809         |
|          pytorch_struct           | 200  |  5.0788  |  6.1654   |  2.7036  |         4.659          |
|                drq                |  1   |  3.4882  |  4.5036   |  2.4324  |         3.1003         |
|               dlrm                | 1024 |  4.3574  |  4.8562   |  2.152   |         3.5646         |
|         soft_actor_critic         | 256  |  1.8517  |  2.5157   |  1.9156  |         1.8886         |
|               dcgan               |  32  |  2.4136  |  3.0343   |  1.4718  |         2.5834         |
|           lennard_jones           | 1000 |  1.8712  |  2.1639   |  1.1432  |         1.7537         |
|   timm_vision_transformer_large   |  32  | 465.8714 |    nan    |   nan    |        427.3604        |
|           hf_Longformer           |  2   | 111.7086 | 162.7831  |   nan    |          nan           |
|               moco                |  32  | 51.7696  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 1.0146 |  0.8787   |  2.8617  |         1.1989         |
|     MobileBertForQuestionAnswering      | 128 | 1.016  |  0.8529   |  2.7653  |         1.1898         |
|             OPTForCausalLM              |  2  | 0.9905 |  0.9061   |  2.4941  |          2.5           |
|       MT5ForConditionalGeneration       | 16  | 1.0134 |  0.8445   |  2.3246  |         1.9616         |
|      GPT2ForSequenceClassification      |  4  | 0.9891 |  0.9595   |  2.3229  |         2.3603         |
|             XGLMForCausalLM             |  8  | 1.0048 |  0.8423   |  2.1839  |         1.5309         |
|       ElectraForQuestionAnswering       | 64  | 0.9973 |  0.9915   |  2.1795  |         2.1631         |
|     M2M100ForConditionalGeneration      | 16  | 0.9979 |  0.8648   |  1.9783  |         1.549          |
|           ElectraForCausalLM            | 32  | 0.9972 |  0.9488   |  1.8444  |         1.8698         |
|    LayoutLMForSequenceClassification    | 16  | 0.9966 |  0.9827   |  1.8359  |         1.8313         |
|            XLNetLMHeadModel             |  8  | 0.9984 |  0.9707   |  1.8261  |         1.822          |
|        BertForQuestionAnswering         | 16  | 0.9981 |   0.982   |  1.8025  |         1.805          |
|       RobertaForQuestionAnswering       | 16  | 0.9971 |  0.9817   |  1.8022  |         1.8116         |
|            PLBartForCausalLM            |  8  | 0.9916 |  0.9612   |  1.6793  |         1.702          |
|               DistillGPT2               | 16  | 0.9927 |   0.96    |  1.6752  |         1.7183         |
|           RobertaForCausalLM            | 16  | 0.9978 |  0.9734   |  1.6719  |         1.6985         |
|       T5ForConditionalGeneration        |  4  | 0.9931 |   0.857   |  1.6644  |         1.7808         |
|                 T5Small                 |  4  | 0.9935 |  0.8618   |  1.6619  |         1.7808         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9979 |   0.977   |  1.6532  |         1.6771         |
|     PLBartForConditionalGeneration      |  4  | 0.9907 |  0.9481   |  1.6405  |         1.6832         |
|       AlbertForQuestionAnswering        |  4  | 1.0003 |   0.886   |  1.6387  |         1.6479         |
|            AlbertForMaskedLM            |  4  | 1.0003 |  0.8853   |  1.6271  |         1.6418         |
|           LayoutLMForMaskedLM           | 16  | 0.9973 |  0.9728   |  1.622   |         1.6145         |
|             BertForMaskedLM             | 16  | 0.9981 |  0.9718   |  1.5964  |         1.6164         |
|                CamemBert                | 16  | 0.9978 |  0.9728   |  1.5465  |         1.5616         |
|         MegatronBertForCausalLM         |  4  | 1.0148 |   0.927   |  1.5379  |         1.6008         |
|         Speech2Text2ForCausalLM         | 256 | 0.984  |  0.9194   |  1.5311  |         1.5733         |
|             BartForCausalLM             |  4  | 0.985  |  0.9598   |  1.5293  |         1.5673         |
|            MBartForCausalLM             |  4  | 0.9878 |  0.9552   |  1.5239  |         1.5525         |
|            YituTechConvBert             | 16  | 0.9976 |  0.9688   |  1.5217  |         1.5225         |
|      MBartForConditionalGeneration      |  2  | 1.0131 |  0.9663   |  1.508   |         1.5002         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0069 |  0.8933   |  1.5055  |         1.454          |
|      BartForConditionalGeneration       |  2  | 1.0092 |  0.9689   |  1.4796  |         1.5056         |
|     DistilBertForQuestionAnswering      | 256 | 0.9969 |   0.99    |  1.4557  |         1.4571         |
|     PegasusForConditionalGeneration     | 32  | 1.0041 |  0.9402   |  1.3718  |         1.3607         |
|           PegasusForCausalLM            | 32  | 0.9954 |  0.9191   |  1.2833  |         1.2306         |
|            TrOCRForCausalLM             | 32  | 0.9883 |  0.9568   |  1.2653  |         1.2934         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9891 |  0.9186   |  1.2618  |         1.2748         |
|          DistilBertForMaskedLM          | 128 | 0.9961 |  0.9532   |  1.2158  |         1.2415         |
|       DebertaForQuestionAnswering       |  8  | 0.8261 |  0.6692   |  1.1949  |         1.0658         |
|           DebertaForMaskedLM            |  4  | 0.772  |  0.5828   |  1.081   |         0.9249         |
|          DebertaV2ForMaskedLM           |  1  | 0.7437 |  0.5607   |  0.9899  |         0.7354         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7291 |  0.5628   |  0.921   |         0.736          |
|          BlenderbotForCausalLM          |  4  | 0.9896 |  0.8463   |   0.0    |         1.4407         |
|          AllenaiLongformerBase          |  4  | 1.0095 |  0.6708   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 17.1425 |  42.101   | 147.9875 |        148.4814        |
|          DebertaV2ForMaskedLM           |  1  | 15.5056 |  27.1391  | 146.6582 |        76.6963         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.3969 |  28.3114  | 146.3319 |        76.4348         |
|     MobileBertForQuestionAnswering      | 128 | 17.0308 |  41.9969  | 145.4655 |        142.3376        |
|     M2M100ForConditionalGeneration      | 16  | 12.0484 |  26.3116  | 137.1319 |        138.3301        |
|       MT5ForConditionalGeneration       | 16  | 7.8487  |  19.1039  | 136.5006 |        136.0151        |
|             XGLMForCausalLM             |  8  | 9.5667  |  20.4296  | 134.9026 |        134.6974        |
|            XLNetLMHeadModel             |  8  | 10.3085 |  27.5138  | 96.6846  |        96.6296         |
|       DebertaForQuestionAnswering       |  8  | 7.1107  |  13.9177  | 91.0034  |        61.3441         |
|           DebertaForMaskedLM            |  4  | 7.4312  |  13.4707  | 90.2026  |        62.3731         |
|      MBartForConditionalGeneration      |  2  | 11.9651 |  25.8451  | 82.6226  |        83.2458         |
|      BartForConditionalGeneration       |  2  | 12.239  |  25.7826  | 77.4899  |        75.7795         |
|            YituTechConvBert             | 16  | 7.2247  |  16.4543  | 72.4923  |        70.7988         |
|     PegasusForConditionalGeneration     | 32  | 5.2674  |  18.9892  | 71.4708  |        68.3861         |
|    MegatronBertForQuestionAnswering     |  8  | 10.3169 |  21.3649  | 69.7293  |        69.5592         |
|         MegatronBertForCausalLM         |  4  | 10.3666 |  21.6219  |  68.789  |        68.7603         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.8785  |  16.9771  | 56.4649  |         58.251         |
|           ElectraForCausalLM            | 32  | 5.4972  |  11.3754  | 54.4425  |        55.0679         |
|       T5ForConditionalGeneration        |  4  |  5.417  |  13.2068  | 52.1933  |        52.6268         |
|                 T5Small                 |  4  | 5.5038  |  13.0597  | 51.8887  |         52.685         |
|     PLBartForConditionalGeneration      |  4  | 6.3466  |  13.4479  | 51.0814  |        49.7981         |
|    LayoutLMForSequenceClassification    | 16  | 5.7754  |  11.6646  |  49.52   |         48.453         |
|       ElectraForQuestionAnswering       | 64  | 5.4506  |  11.3072  | 46.2864  |        47.7135         |
|             BertForMaskedLM             | 16  | 5.3152  |  10.8446  | 44.5746  |        43.0551         |
|           LayoutLMForMaskedLM           | 16  | 5.8544  |  11.8339  | 44.1477  |        41.5678         |
|            MBartForCausalLM             |  4  | 5.8035  |  11.2414  | 42.9742  |        42.4551         |
|        BertForQuestionAnswering         | 16  | 5.3384  |  10.5557  | 41.9033  |        42.6015         |
|            AlbertForMaskedLM            |  4  | 2.1563  |  8.0203   | 41.6873  |        39.5969         |
|             BartForCausalLM             |  4  | 5.6888  |  10.9069  | 41.2848  |        40.0737         |
|             OPTForCausalLM              |  2  | 4.6862  |  10.2111  | 41.0329  |        39.2297         |
|           RobertaForCausalLM            | 16  | 5.2215  |  11.3117  | 40.3172  |        39.8708         |
|                CamemBert                | 16  |  5.156  |  11.3612  | 40.2703  |        39.7799         |
|            TrOCRForCausalLM             | 32  | 5.5725  |  11.2145  | 40.0733  |         39.156         |
|           PegasusForCausalLM            | 32  | 5.6856  |  11.0043  |  39.788  |         39.526         |
|       RobertaForQuestionAnswering       | 16  | 5.1798  |  11.2212  | 39.3179  |        38.8227         |
|     DistilBertForQuestionAnswering      | 256 | 2.6003  |  5.6487   | 37.4978  |         38.077         |
|      GPT2ForSequenceClassification      |  4  | 4.6912  |  9.5598   | 37.4659  |        37.6773         |
|       AlbertForQuestionAnswering        |  4  | 2.1552  |   7.951   | 37.2495  |        35.2422         |
|          DistilBertForMaskedLM          | 128 | 2.6291  |  5.7029   | 36.7261  |        36.3982         |
|       BlenderbotSmallForCausalLM        | 64  | 4.0983  |  7.6376   | 31.5268  |        31.2846         |
|               DistillGPT2               | 16  | 2.6078  |  4.9686   | 30.0645  |        30.5314         |
|            PLBartForCausalLM            |  8  | 3.0539  |  6.0565   | 28.7937  |        27.3012         |
|         Speech2Text2ForCausalLM         | 256 | 2.9144  |  5.8402   | 27.7502  |        28.3297         |
|          BlenderbotForCausalLM          |  4  | 11.2727 |  21.977   |   nan    |        71.0683         |
|          AllenaiLongformerBase          |  4  | 9.6893  |  30.4951  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1376  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.094   |         1.1346         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0607  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0603  |         1.1724         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0077  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0075  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0035  |         1.0491         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  0.9911  |         1.0411         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9729  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9649  |         1.052          |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9501  |         1.268          |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9281  |         0.9912         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9138  |         0.9886         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9137  |         0.9749         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0018         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.8941  |         0.9739         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.893   |         0.9864         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8836  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.8689  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8184  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.789   |         0.8779         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7651  |         0.9908         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7117  |         0.9792         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9143   |  0.5646  |         0.9988         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5187  |         0.9664         |
|       DebertaForQuestionAnswering       |  8  | 0.9506 |  1.0516   |  0.4867  |         1.1525         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9763 |  0.9764   |  0.4855  |          0.98          |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8694   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 265.8277 | 300.2698  | 163.7235 |        162.1218        |
|       AlbertForQuestionAnswering        |  4  | 263.8618 | 297.6653  | 161.4197 |        160.1728        |
|            XLNetLMHeadModel             |  8  | 280.0064 |  288.037  | 152.9095 |        152.9537        |
|      DebertaV2ForQuestionAnswering      |  2  | 150.3309 | 210.1941  | 114.6888 |        143.4971        |
|     PegasusForConditionalGeneration     | 32  | 139.2056 | 153.2898  | 110.8039 |        110.4749        |
|            TrOCRForCausalLM             | 32  | 139.6331 | 143.2027  | 108.7247 |        106.2983        |
|          DebertaV2ForMaskedLM           |  1  | 158.4414 | 208.9725  | 104.3129 |        142.911         |
|      MBartForConditionalGeneration      |  2  | 138.3541 | 144.4102  | 94.4161  |        92.1249         |
|      BartForConditionalGeneration       |  2  | 149.9372 | 142.9487  | 93.0202  |        96.3169         |
|    MegatronBertForQuestionAnswering     |  8  | 142.2369 | 144.8194  | 85.8236  |         84.749         |
|            YituTechConvBert             | 16  | 125.526  | 129.7408  | 82.2666  |        82.0851         |
| BlenderbotSmallForConditionalGeneration | 64  | 122.4099 | 129.2312  | 80.7242  |        78.5533         |
|                CamemBert                | 16  | 118.5134 | 121.6792  | 76.4372  |        75.8198         |
|     M2M100ForConditionalGeneration      | 16  | 151.638  |  165.523  | 75.0487  |        92.7146         |
|            MBartForCausalLM             |  4  | 114.7565 | 118.8911  | 74.4613  |        73.0678         |
|             BartForCausalLM             |  4  | 115.9677 | 117.8502  |  74.079  |        73.3182         |
|     PLBartForConditionalGeneration      |  4  | 117.5285 | 127.2637  | 71.6526  |        70.7802         |
|     DistilBertForQuestionAnswering      | 256 | 104.024  | 104.3098  | 71.0253  |         71.143         |
|     MobileBertForQuestionAnswering      | 128 | 166.2824 |  225.401  | 69.9327  |        145.867         |
|            PLBartForCausalLM            |  8  | 116.4107 |  117.064  | 69.8927  |        68.5624         |
|          DistilBertForMaskedLM          | 128 | 84.9574  |  88.766   | 69.5559  |         68.091         |
|           LayoutLMForMaskedLM           | 16  | 112.9757 | 115.8123  | 69.5344  |        69.6729         |
|             BertForMaskedLM             | 16  | 110.329  | 113.0848  | 69.0812  |        68.0182         |
|             OPTForCausalLM              |  2  | 172.3382 | 181.9132  | 68.7977  |        68.4042         |
|           RobertaForCausalLM            | 16  | 115.2403 | 118.3359  | 68.7282  |        67.6851         |
|       DebertaForQuestionAnswering       |  8  | 91.6959  | 115.3691  | 63.3452  |        70.9718         |
|               DistillGPT2               | 16  | 106.6101 | 109.9032  | 63.0478  |        61.4941         |
|                 T5Small                 |  4  | 104.989  | 121.7066  | 62.7258  |        58.7645         |
|       T5ForConditionalGeneration        |  4  | 104.7549 | 122.4456  | 62.6813  |        58.5864         |
|          MobileBertForMaskedLM          | 64  | 170.3472 | 236.5794  | 60.8566  |        150.1394        |
|           PegasusForCausalLM            | 32  | 71.2607  |  75.4986  | 58.6041  |        56.8386         |
|           DebertaForMaskedLM            |  4  | 91.4774  | 105.6055  | 58.4908  |        68.5464         |
|         MegatronBertForCausalLM         |  4  | 85.8575  |  93.7201  | 56.7291  |        59.9454         |
|             XGLMForCausalLM             |  8  | 92.5064  | 139.6219  | 53.8805  |        76.7747         |
|    LayoutLMForSequenceClassification    | 16  | 98.1548  |  99.3708  | 53.2834  |        53.3683         |
|       RobertaForQuestionAnswering       | 16  | 95.7377  |  97.4864  | 53.0558  |         52.742         |
|        BertForQuestionAnswering         | 16  |  95.535  |  96.7255  | 52.7294  |        52.7152         |
|       ElectraForQuestionAnswering       | 64  | 115.2997 | 117.3709  | 52.5606  |        53.7531         |
|           ElectraForCausalLM            | 32  |  88.663  |  92.9648  | 47.7042  |        47.1129         |
|       BlenderbotSmallForCausalLM        | 64  |  62.243  |  68.3323  | 46.5518  |        45.4643         |
|       MT5ForConditionalGeneration       | 16  | 91.6336  | 121.6054  | 41.7745  |        47.4305         |
|      GPT2ForSequenceClassification      |  4  | 92.3787  |  95.2804  | 39.3139  |        38.6827         |
|         Speech2Text2ForCausalLM         | 256 | 53.0331  |  58.2425  | 34.6256  |         34.054         |
|          BlenderbotForCausalLM          |  4  | 111.7661 | 139.5814  |   nan    |        85.8349         |
|          AllenaiLongformerBase          |  4  | 179.8541 | 270.0539  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9996 |  0.9987   |  3.0276  |         2.987          |
|      xcit_large_24_p8_224       |  5  | 0.9998 |  0.8715   |  2.0706  |         1.6366         |
|        twins_pcpvt_base         | 64  | 1.0051 |  0.9097   |  1.9967  |         1.7342         |
|         coat_lite_mini          | 128 | 0.9993 |  0.9981   |  1.955   |         1.9296         |
|          gmlp_s16_224           | 128 | 1.0001 |  1.0892   |  1.8699  |         1.8492         |
|          ghostnet_100           | 128 | 0.9979 |  0.7689   |  1.8654  |         1.6264         |
|          gmixer_24_224          | 128 | 0.9997 |  0.8928   |  1.7796  |         1.7652         |
|           volo_d1_224           | 64  | 0.9996 |  0.9783   |  1.7055  |         1.6828         |
|            lcnet_050            | 128 | 0.9445 |  0.7373   |  1.695   |         1.4701         |
|         crossvit_9_240          | 128 | 0.9998 |  0.7885   |  1.6708  |         1.6408         |
|  swin_base_patch4_window7_224   | 64  | 0.9995 |  0.9617   |  1.6376  |         1.6269         |
|           convit_base           | 64  | 0.9996 |  0.9995   |  1.618   |         1.6162         |
|       gluon_inception_v3        | 128 | 0.9997 |  0.8681   |  1.5399  |         1.5302         |
|        adv_inception_v3         | 128 | 0.9996 |  0.8631   |  1.5387  |         1.5275         |
|          inception_v3           | 128 | 0.9994 |  0.8669   |  1.5384  |         1.527          |
|             dla102              | 128 | 0.9996 |  0.8181   |  1.5358  |         1.5338         |
|        sebotnet33ts_256         | 64  | 0.9654 |  0.7696   |  1.5264  |         1.5561         |
|          convnext_base          | 64  | 0.9996 |   1.001   |  1.5257  |         1.5068         |
|            nfnet_l0             | 128 | 0.9991 |  0.8184   |  1.514   |         1.4553         |
|           dm_nfnet_f0           | 128 | 0.9996 |  0.9978   |  1.5042  |         1.4564         |
|       eca_botnext26ts_256       | 128 | 0.9778 |  0.7213   |  1.455   |         1.4345         |
|            pit_b_224            | 64  | 0.9994 |  0.9977   |  1.4449  |         1.4383         |
|           mobilevit_s           | 64  | 0.9711 |   0.736   |  1.4403  |         1.4631         |
|      mobilenetv3_large_100      | 128 | 0.9525 |  0.7627   |  1.4392  |         1.4609         |
|           regnety_002           | 128 | 0.9652 |  0.7233   |  1.4358  |         1.2559         |
|           mnasnet_100           | 128 | 0.9494 |  0.7417   |  1.432   |         1.5005         |
|           resnest101e           | 64  | 0.9998 |  0.8702   |  1.4319  |         1.3678         |
|          botnet26t_256          | 128 | 0.977  |  0.8545   |  1.4159  |         1.4329         |
|           selecsls42b           | 128 | 0.9987 |  0.8128   |  1.4131  |         1.4127         |
|          jx_nest_base           | 32  | 0.9994 |  0.9976   |  1.3904  |         1.3827         |
|         mobilenetv2_100         | 128 | 0.9502 |  0.7383   |  1.3873  |         1.4485         |
|        res2net50_14w_8s         | 128 | 0.9995 |  0.7908   |  1.3826  |         1.3599         |
|            hrnet_w18            | 128 | 0.9985 |  0.6477   |  1.3787  |         1.3642         |
|           res2next50            | 128 | 0.9995 |  0.8265   |  1.373   |         1.3648         |
|        ese_vovnet19b_dw         | 128 | 0.9657 |  0.8383   |  1.3637  |         1.387          |
|          mixer_b16_224          | 128 | 0.9998 |  1.0208   |  1.3607  |         1.3634         |
|       tf_efficientnet_b0        | 128 | 0.9637 |  0.6829   |  1.3605  |         1.3929         |
|      beit_base_patch16_224      | 64  | 0.9994 |  0.9691   |  1.3573  |         1.3571         |
|          cait_m36_384           |  4  | 1.0003 |  0.9986   |  1.3556  |         1.3541         |
|          spnasnet_100           | 128 | 0.9426 |  0.7398   |  1.3537  |         1.421          |
|         poolformer_m36          | 64  | 0.9998 |  0.9963   |  1.351   |         1.3422         |
|           fbnetc_100            | 128 | 0.9507 |   0.74    |  1.3488  |         1.4076         |
|            fbnetv3_b            | 128 | 0.9516 |  0.7707   |  1.3143  |         1.326          |
|           rexnet_100            | 128 | 0.9599 |  0.7077   |  1.3099  |         1.352          |
|          resmlp_12_224          | 128 | 0.9997 |  0.8954   |  1.2729  |         1.2676         |
| deit_base_distilled_patch16_224 | 64  | 0.9995 |  0.9968   |  1.2606  |         1.2609         |
|          cspdarknet53           | 64  | 0.9409 |  0.7925   |  1.2439  |         1.2793         |
|      vit_base_patch16_224       | 64  | 0.9992 |  0.9969   |  1.2398  |         1.2407         |
|            tinynet_a            | 128 | 0.9494 |  0.6805   |  1.2353  |         1.2737         |
|           tf_mixnet_l           | 128 | 0.9811 |  0.8304   |  1.1922  |         1.1986         |
|            mixnet_l             | 128 | 0.9798 |  0.8232   |  1.1811  |         1.1872         |
|         visformer_small         | 128 | 0.9992 |  0.9483   |  1.1772  |         1.1708         |
|        res2net101_26w_4s        | 64  | 1.0031 |  0.7986   |  1.1638  |         1.1097         |
|          pnasnet5large          | 16  | 0.997  |  0.9287   |  1.1267  |         1.1451         |
|             dpn107              | 32  | 0.9396 |   0.814   |  1.1053  |         1.1451         |
|            repvgg_a2            | 128 | 0.9431 |   0.76    |  1.0964  |         1.1306         |
|        gluon_xception65         | 32  | 0.9996 |  0.8473   |  1.0823  |          1.09          |
|     swsl_resnext101_32x16d      | 32  | 0.9993 |  0.8414   |  1.0601  |         1.0268         |
|            gernet_l             | 128 | 0.944  |  0.7986   |  1.0452  |         1.0771         |
|        convmixer_768_32         | 32  | 0.9995 |  0.9646   |  1.0026  |         1.0038         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.5428  |  11.1443  | 285.3998 |        300.8896        |
|            hrnet_w18            | 128 | 9.9361  |  36.4149  | 255.186  |        247.495         |
|          ghostnet_100           | 128 | 7.7093  |  14.9482  | 236.8368 |        236.9131        |
|           mobilevit_s           | 64  | 5.2412  |  11.2108  | 178.4743 |        177.6305        |
|            fbnetv3_b            | 128 | 8.2826  |  16.7846  | 175.0752 |        173.6566        |
|           resnest101e           | 64  | 11.2582 |  24.4689  | 171.6154 |        169.8922        |
|           tf_mixnet_l           | 128 | 8.8691  |  16.4479  | 167.7249 |        163.8811        |
|          pnasnet5large          | 16  | 8.3485  |  26.5507  | 166.9575 |        164.1125        |
|          inception_v3           | 128 | 5.7607  |  12.7154  | 166.2498 |        162.6374        |
|      mobilenetv3_large_100      | 128 | 4.2893  |  8.2866   | 165.8546 |        165.8863        |
|            tinynet_a            | 128 | 5.8778  |  12.1401  | 165.3405 |        165.6042        |
|            mixnet_l             | 128 | 8.3972  |  16.2002  | 163.2159 |        157.299         |
|        adv_inception_v3         | 128 | 5.6645  |  12.516   | 161.2748 |        162.2546        |
|       tf_efficientnet_b0        | 128 | 5.0948  |  10.3972  | 160.7946 |        152.2355        |
|       gluon_inception_v3        | 128 |  5.852  |  12.5993  | 159.9696 |        160.4201        |
|        res2net101_26w_4s        | 64  | 10.8579 |  24.9506  | 156.6893 |        154.9523        |
|        twins_pcpvt_base         | 64  | 10.8985 |  23.3727  | 153.3044 |        151.4508        |
|          spnasnet_100           | 128 | 4.9369  |  9.4015   | 140.9908 |        138.2189        |
|           fbnetc_100            | 128 | 5.0085  |  9.3893   | 140.5204 |        140.6783        |
|      xcit_large_24_p8_224       |  5  | 12.6563 |  28.1672  | 139.2014 |        135.8129        |
|         mobilenetv2_100         | 128 | 3.9654  |  7.7909   | 131.637  |        132.3704        |
|        res2net50_14w_8s         | 128 | 9.0591  |  22.2913  | 127.4727 |        127.9485        |
|           mnasnet_100           | 128 | 3.9122  |  7.5688   | 126.8427 |        124.5631        |
|          cait_m36_384           |  4  | 13.6445 |  30.7544  | 116.5026 |        119.9374        |
|  swin_base_patch4_window7_224   | 64  | 8.3356  |  19.3725  | 114.2739 |        112.6295        |
|           regnety_002           | 128 | 4.7959  |  8.8528   | 113.1897 |        106.9853        |
|        sebotnet33ts_256         | 64  | 4.2106  |  8.9257   | 112.0024 |        113.3559        |
|         poolformer_m36          | 64  | 7.5551  |  13.6402  | 105.5492 |        103.9479        |
|          cspdarknet53           | 64  | 5.8493  |  10.7749  | 103.6953 |        99.4054         |
|       eca_botnext26ts_256       | 128 | 3.0223  |  6.7603   | 100.9911 |        97.6025         |
|             dpn107              | 32  | 9.8301  |  19.3904  | 100.5686 |        100.1984        |
|            lcnet_050            | 128 | 2.4871  |  4.9844   | 99.4908  |        100.4141        |
|           selecsls42b           | 128 |  2.522  |   5.352   | 98.5915  |        94.9589         |
|             dla102              | 128 | 6.4739  |  14.0823  | 98.3364  |        100.9447        |
|        gluon_xception65         | 32  | 7.8881  |  16.813   | 98.2176  |        96.7247         |
|          botnet26t_256          | 128 |  2.886  |  5.8429   |  93.595  |        90.7862         |
|           res2next50            | 128 |  5.082  |  12.0727  | 92.7081  |        91.9191         |
|         coat_lite_mini          | 128 | 3.2336  |  7.7748   | 92.1586  |        92.6301         |
|         crossvit_9_240          | 128 | 5.8177  |  13.3037  |  90.929  |        89.2533         |
|          jx_nest_base           | 32  | 6.5825  |  14.6341  | 88.3382  |         86.219         |
|            gernet_l             | 128 | 4.8911  |   8.925   | 85.3666  |        83.7637         |
|            nfnet_l0             | 128 | 5.2159  |  10.9038  | 83.8604  |        80.5168         |
|        ese_vovnet19b_dw         | 128 | 2.5249  |  4.5012   | 79.7344  |        77.6332         |
|           volo_d1_224           | 64  | 5.0631  |  11.5923  | 78.1002  |        78.1342         |
|           dm_nfnet_f0           | 128 | 5.8154  |  11.396   | 73.8288  |        74.3148         |
|        tnt_s_patch16_224        | 128 | 6.5342  |  15.8265  | 72.7333  |        71.9562         |
|         visformer_small         | 128 | 2.5493  |   5.924   | 69.7095  |        69.9082         |
|     swsl_resnext101_32x16d      | 32  | 6.2814  |  13.8258  |  66.256  |        64.5271         |
|            repvgg_a2            | 128 | 4.7602  |  8.7894   | 64.7005  |        65.8583         |
|          gmlp_s16_224           | 128 | 5.5861  |  12.0206  | 63.7036  |        62.6867         |
|          convnext_base          | 64  | 6.9266  |  12.4584  | 60.8613  |        62.1093         |
|          gmixer_24_224          | 128 | 5.6645  |  12.7415  | 54.2367  |         53.915         |
|           convit_base           | 64  | 3.4091  |  8.3951   | 50.3128  |        49.3713         |
|            pit_b_224            | 64  | 3.5342  |  7.9352   | 48.6932  |         48.414         |
|          resmlp_12_224          | 128 | 2.8024  |  5.3794   | 44.6442  |        44.2869         |
| deit_base_distilled_patch16_224 | 64  | 3.1032  |   7.003   | 43.0631  |         42.383         |
|      vit_base_patch16_224       | 64  | 3.1253  |  6.9006   | 42.6969  |        41.9235         |
|        convmixer_768_32         | 32  |  1.668  |  6.8887   | 40.5339  |        38.0311         |
|      beit_base_patch16_224      | 64  | 3.8737  |  8.5269   |  36.963  |        37.1223         |
|          mixer_b16_224          | 128 | 2.6904  |  5.8542   | 36.2767  |        35.6642         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1848  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1117  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0079  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9899 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9634 |  0.9155   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9501  |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9362         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9782 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8225  |         0.9732         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7779   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.058  | 311.2417  | 299.4975 |        298.9608        |
|            hrnet_w18            | 128 | 280.4739 | 431.5859  | 202.9371 |        204.407         |
|          pnasnet5large          | 16  | 196.5013 | 210.3261  | 174.4384 |        171.118         |
|           tf_mixnet_l           | 128 | 193.0007 | 227.9883  | 158.7195 |        157.9425        |
|            mixnet_l             | 128 | 184.5784 |  219.794  | 153.2662 |        152.4812        |
|          cait_m36_384           |  4  | 167.2548 | 167.1747  | 123.1912 |        123.0813        |
|           resnest101e           | 64  | 163.6208 | 188.4791  | 114.0389 |        119.5133        |
|             dla102              | 128 | 171.8483 | 209.6514  | 111.7423 |        111.9256        |
|     swsl_resnext101_32x16d      | 32  | 118.2357 |  140.875  | 111.5415 |        115.1405        |
|         poolformer_m36          | 64  | 144.6546 | 145.3122  | 107.0468 |        107.7051        |
|        tnt_s_patch16_224        | 128 | 322.9016 | 322.9895  | 106.6896 |        108.018         |
|          inception_v3           | 128 | 160.4076 | 184.8527  | 104.1351 |        104.9339        |
|       gluon_inception_v3        | 128 | 160.2933 | 184.6238  | 104.0763 |        104.6563        |
|        adv_inception_v3         | 128 | 159.9368 | 185.4001  | 103.9846 |        104.8333        |
|        res2net50_14w_8s         | 128 | 140.7663 |  178.113  | 101.9505 |        103.4473        |
|           convit_base           | 64  | 162.9796 | 162.7599  | 100.7273 |        100.6406        |
|             dpn107              | 32  | 112.6873 | 130.0011  | 96.0061  |        92.8268         |
|           res2next50            | 128 | 125.937  | 152.0844  | 91.7168  |        92.1922         |
|        gluon_xception65         | 32  | 99.0615  | 116.7449  | 91.5095  |        90.6964         |
|  swin_base_patch4_window7_224   | 64  | 146.0788 | 151.7887  | 89.0638  |        89.7282         |
|          mixer_b16_224          | 128 | 116.278  | 113.8145  | 86.2736  |         85.387         |
|        res2net101_26w_4s        | 64  | 99.3618  |  124.811  | 84.9615  |        88.6602         |
|           dm_nfnet_f0           | 128 | 126.8475 | 126.7407  | 84.0338  |        87.1388         |
|            fbnetv3_b            | 128 | 114.8822 | 142.2435  | 83.2807  |        82.5772         |
|            pit_b_224            | 64  | 118.1475 | 118.2753  | 81.7036  |        82.0368         |
|          convnext_base          | 64  | 122.0767 |  121.791  | 80.0812  |        81.1302         |
|         visformer_small         | 128 | 90.8699  |  95.7063  | 77.1829  |        77.5567         |
|      beit_base_patch16_224      | 64  | 101.1734 | 104.2152  | 74.5365  |         74.562         |
|            nfnet_l0             | 128 | 111.5624 | 136.0524  | 74.0946  |        76.6542         |
|          gmlp_s16_224           | 128 | 136.7716 | 125.3915  | 73.1973  |        73.9037         |
|       eca_botnext26ts_256       | 128 | 108.2901 | 146.6581  | 72.8227  |        73.7101         |
|          jx_nest_base           | 32  | 100.3697 | 100.4187  | 71.9395  |        72.2364         |
|          cspdarknet53           | 64  | 94.0462  | 111.6704  | 71.2553  |         69.116         |
|           volo_d1_224           | 64  | 120.1608 | 122.6914  | 70.6031  |        71.5096         |
|          botnet26t_256          | 128 | 101.333  | 115.9993  | 70.0049  |        69.1836         |
|      vit_base_patch16_224       | 64  | 86.5375  |  86.6511  |  69.877  |        69.6892         |
|            gernet_l             | 128 | 76.8928  |  91.0981  | 69.5333  |        67.5366         |
| deit_base_distilled_patch16_224 | 64  | 84.4916  |  84.7805  | 67.1192  |        66.9249         |
|            repvgg_a2            | 128 |  77.012  |  95.5338  | 66.1797  |        64.2784         |
|          gmixer_24_224          | 128 | 117.4771 | 131.5837  | 66.0171  |        66.4288         |
|      xcit_large_24_p8_224       |  5  | 129.7287 | 141.9015  | 60.7634  |        75.7701         |
|       tf_efficientnet_b0        | 128 | 84.5037  | 119.1176  | 59.8827  |        58.3498         |
|        twins_pcpvt_base         | 64  | 118.6604 | 130.8355  | 59.0732  |        66.4804         |
|           fbnetc_100            | 128 | 82.8394  | 106.2267  | 58.3237  |        55.8579         |
|           rexnet_100            | 128 | 79.3639  | 107.6932  | 58.1303  |        56.2668         |
|         coat_lite_mini          | 128 | 112.7317 | 112.8867  | 57.6599  |        58.3125         |
|           mobilevit_s           | 64  | 83.8397  | 110.4985  | 56.5335  |        55.5141         |
|            tinynet_a            | 128 | 73.2026  | 102.6212  | 56.4254  |        54.7042         |
|        sebotnet33ts_256         | 64  | 79.7213  |  99.9125  |  50.457  |        49.4418         |
|          spnasnet_100           | 128 | 70.2217  |  89.7259  | 48.9745  |        46.6507         |
|         crossvit_9_240          | 128 | 81.5916  | 103.7551  | 48.9577  |        49.8055         |
|          ghostnet_100           | 128 | 90.0415  | 116.9621  | 48.2382  |        55.3524         |
|        ese_vovnet19b_dw         | 128 | 63.9778  |  73.8221  |  45.372  |        44.6286         |
|         mobilenetv2_100         | 128 | 65.5097  |  84.3562  | 44.7767  |        42.9326         |
|           mnasnet_100           | 128 | 64.2089  |  82.1244  | 42.5022  |        40.6093         |
|           selecsls42b           | 128 | 59.9991  |  73.8235  | 42.3767  |        42.4244         |
|          resmlp_12_224          | 128 | 53.1728  |  59.1778  | 41.6746  |        41.8654         |
|      mobilenetv3_large_100      | 128 |  61.032  |  76.3845  | 40.4522  |        39.8679         |
|           regnety_002           | 128 | 39.5834  |  56.4401  | 25.7606  |        29.9655         |
|            lcnet_050            | 128 |  31.61   |  40.4639  | 17.5687  |        20.2941         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/huggingface_amp.png :

bench_logs/torchbench_amp.png :

bench_logs/timm_models_amp.png :

Build Summary

see more

Run name

day_102_12_04_23_performance_amp_274

Commit hashes

pytorch commit: 46a31e9
pytorch commit date: 2023-04-13 01:58:27+00:00
torchbench commit: 25f367952dfbb5cd67f24cb60b0e9c3c0011dca9
torchbench commit date: 2023-04-11 20:59:50-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git46a31e9

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Member

Performance Dashboard for amp precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.59x    |    1.59x    |    1.41x    |
| inductor_no_cudagraphs |   1.30x    |    1.51x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.76    |    7.21     |    5.96     |
|       aot_eager        |    9.12    |    15.76    |    13.06    |
|        inductor        |   62.98    |    61.61    |   109.52    |
| inductor_no_cudagraphs |   63.13    |    58.90    |   109.33    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.78x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153

Previous report name: /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Passrate diff

+------------------------+-------------+-------------+-------------+
|        compiler        |    suite    | prev_value  |  cur_value  |
+------------------------+-------------+-------------+-------------+
|        inductor        | torchbench  | 85%, 51/60  | 85%, 51/60  |
|        inductor        | huggingface | 91%, 41/45  | 91%, 41/45  |
|        inductor        | timm_models | 100%, 60/60 | 100%, 60/60 |
| inductor_no_cudagraphs | torchbench  | 87%, 52/60  | 87%, 52/60  |
| inductor_no_cudagraphs | huggingface | 96%, 43/45  | 96%, 43/45  |
| inductor_no_cudagraphs | timm_models | 100%, 60/60 | 100%, 60/60 |
+------------------------+-------------+-------------+-------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.62x    |   1.59x   |
|        inductor        | huggingface |   1.65x    |   1.59x   |
|        inductor        | timm_models |   1.42x    |   1.41x   |
| inductor_no_cudagraphs | torchbench  |   1.30x    |   1.30x   |
| inductor_no_cudagraphs | huggingface |   1.54x    |   1.51x   |
| inductor_no_cudagraphs | timm_models |   1.41x    |   1.39x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+------------------------+-----------------+
|    suite    |             name              | inductor_no_cudagraphs |    inductor     |
+-------------+-------------------------------+------------------------+-----------------+
| torchbench  |         hf_Longformer         |      fail_to_run       |   fail_to_run   |
| torchbench  |             moco              |      fail_to_run       |   fail_to_run   |
| torchbench  |      Background_Matting       |    eager_variation     | eager_variation |
| torchbench  |           tacotron2           |         0.0000         |     0.0000      |
| torchbench  |              gat              |         0.0000         |     0.0000      |
| torchbench  |              gcn              |         0.0000         |     0.0000      |
| torchbench  |             llama             |         0.0000         |     0.0000      |
| torchbench  |             sage              |         0.0000         |     0.0000      |
| torchbench  |         torchrec_dlrm         |         0.0000         |     0.0000      |
| huggingface | DebertaV2ForQuestionAnswering |          pass          |   fail_to_run   |
| huggingface |  AlbertForQuestionAnswering   |     fail_accuracy      |  fail_accuracy  |
+-------------+-------------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+-------------------------------+------------------------+----------+
|    suite    |             name              | inductor_no_cudagraphs | inductor |
+-------------+-------------------------------+------------------------+----------+
| torchbench  |         lennard_jones         |         0.8334         |  1.4113  |
| torchbench  |             dcgan             |         0.8203         |  1.2424  |
| torchbench  |       soft_actor_critic       |         0.8239         |  1.1606  |
| torchbench  |          timm_vovnet          |         0.9069         |  0.9051  |
| torchbench  |    nvidia_deeprecommender     |         1.0181         |  0.8719  |
| torchbench  | timm_vision_transformer_large |         1.0835         |   0.0    |
| torchbench  |         hf_Longformer         |          0.0           |   0.0    |
| torchbench  |             moco              |          0.0           |   0.0    |
| torchbench  |              gat              |          0.0           |   0.0    |
| torchbench  |              gcn              |          0.0           |   0.0    |
| torchbench  |             sage              |          0.0           |   0.0    |
| torchbench  |           tacotron2           |          0.0           |   0.0    |
| torchbench  |         torchrec_dlrm         |          0.0           |   0.0    |
| huggingface |      DebertaForMaskedLM       |         0.9155         |  1.1071  |
| huggingface |     DebertaV2ForMaskedLM      |         0.7633         |  0.9915  |
| huggingface | DebertaV2ForQuestionAnswering |         0.7649         |  0.9365  |
| huggingface |     BlenderbotForCausalLM     |         1.1113         |   0.0    |
| huggingface |     AllenaiLongformerBase     |          0.0           |   0.0    |
+-------------+-------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------------------+----------+
|    suite    |              name              | inductor_no_cudagraphs | inductor |
+-------------+--------------------------------+------------------------+----------+
| torchbench  |          hf_T5_large           |        172.6101        | 172.7875 |
| torchbench  |        phlippe_densenet        |        167.556         | 167.5088 |
| torchbench  |           hf_BigBird           |        129.709         | 152.3193 |
| torchbench  |       timm_efficientnet        |        142.4561        | 145.1222 |
| torchbench  |          densenet121           |        135.0357        | 138.4377 |
| torchbench  |       mobilenet_v3_large       |        138.2815        | 134.8303 |
| torchbench  |          mobilenet_v2          |        131.4654        | 132.6283 |
| torchbench  |             yolov3             |        117.8476        | 121.1123 |
| torchbench  | timm_vision_transformer_large  |        123.9497        |   nan    |
| huggingface |     MobileBertForMaskedLM      |        143.5481        | 144.6304 |
| huggingface | DebertaV2ForQuestionAnswering  |        75.0211         | 143.9355 |
| huggingface |      DebertaV2ForMaskedLM      |        72.6957         | 142.4583 |
| huggingface | MobileBertForQuestionAnswering |        137.7172        | 140.5316 |
| huggingface | M2M100ForConditionalGeneration |        137.0814        | 135.7955 |
| huggingface |  MT5ForConditionalGeneration   |        131.0958        | 131.704  |
| huggingface |        XGLMForCausalLM         |        120.5425        | 121.2938 |
| timm_models |           rexnet_100           |        277.7929        | 277.1841 |
| timm_models |           hrnet_w18            |        242.5264        | 252.9675 |
| timm_models |          ghostnet_100          |        239.6408        | 240.8263 |
| timm_models |           fbnetv3_b            |        169.5089        | 173.1145 |
| timm_models |          mobilevit_s           |        166.9193        | 169.4187 |
| timm_models |          resnest101e           |        167.3397        | 166.0831 |
| timm_models |         pnasnet5large          |        159.0844        | 163.7173 |
| timm_models |           tinynet_a            |        158.4462        | 160.5645 |
| timm_models |          tf_mixnet_l           |        160.187         | 157.562  |
| timm_models |            mixnet_l            |        161.5003        | 156.9406 |
| timm_models |        adv_inception_v3        |        157.2056        | 156.8093 |
| timm_models |     mobilenetv3_large_100      |        162.1072        | 156.4856 |
| timm_models |       gluon_inception_v3       |        159.6783        | 155.3081 |
| timm_models |       res2net101_26w_4s        |         152.37         | 153.4042 |
| timm_models |          inception_v3          |        160.432         | 153.2347 |
| timm_models |        twins_pcpvt_base        |        147.3283        | 148.8185 |
| timm_models |       tf_efficientnet_b0       |        149.7758        | 148.5953 |
| timm_models |           fbnetc_100           |        138.7556        | 136.2882 |
| timm_models |      xcit_large_24_p8_224      |        132.4333        | 134.8512 |
| timm_models |          spnasnet_100          |        135.3223        | 133.8117 |
| timm_models |        mobilenetv2_100         |        129.8349        | 129.7049 |
| timm_models |          mnasnet_100           |        120.9557        | 122.5684 |
| timm_models |        res2net50_14w_8s        |        123.1389        | 122.2357 |
+-------------+--------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |                 yolov3                  |         1.0155         |  0.8922  |
| torchbench  |              hf_GPT2_large              |         1.128          |  0.8904  |
| torchbench  |            timm_efficientnet            |         1.006          |  0.8699  |
| torchbench  |           speech_transformer            |         0.8682         |  0.8651  |
| torchbench  |              timm_resnest               |         0.9523         |  0.8621  |
| torchbench  |           shufflenet_v2_x1_0            |         0.958          |  0.8614  |
| torchbench  |               Super_SloMo               |         1.208          |  0.8614  |
| torchbench  |         timm_vision_transformer         |         0.8835         |  0.8593  |
| torchbench  |                resnet152                |         0.9396         |  0.8489  |
| torchbench  |               timm_regnet               |         0.9514         |  0.8486  |
| torchbench  |           Background_Matting            |         1.0406         |  0.8485  |
| torchbench  |              hf_DistilBert              |         0.9945         |  0.8476  |
| torchbench  |                 hf_Bert                 |         1.0258         |  0.8411  |
| torchbench  |              hf_Bert_large              |         1.0725         |  0.8302  |
| torchbench  |               hf_T5_large               |         1.168          |  0.8201  |
| torchbench  |              pytorch_unet               |         0.9308         |  0.8134  |
| torchbench  |            phlippe_densenet             |         0.8659         |  0.8058  |
| torchbench  |                 hf_Bart                 |         0.9166         |  0.793   |
| torchbench  |                  dcgan                  |         0.9645         |  0.7821  |
| torchbench  |                resnet50                 |         0.8851         |  0.7821  |
| torchbench  |                 demucs                  |         0.9655         |  0.7731  |
| torchbench  |              squeezenet1_1              |         0.9074         |  0.7722  |
| torchbench  |             pytorch_stargan             |         0.8893         |  0.7715  |
| torchbench  |               timm_vovnet               |         0.8869         |  0.7529  |
| torchbench  |               mnasnet1_0                |         0.7758         |  0.7448  |
| torchbench  |           mobilenet_v3_large            |         0.7757         |  0.7274  |
| torchbench  |                  vgg16                  |         0.9808         |  0.7227  |
| torchbench  |               densenet121               |         0.8059         |  0.7096  |
| torchbench  |                 alexnet                 |         0.939          |  0.7091  |
| torchbench  |             pytorch_struct              |         0.7362         |  0.697   |
| torchbench  |               hf_BigBird                |         1.1191         |  0.6947  |
| torchbench  |             resnext50_32x4d             |         0.7738         |  0.666   |
| torchbench  |         nvidia_deeprecommender          |         0.8931         |  0.6585  |
| torchbench  |                   drq                   |         0.9573         |  0.6379  |
| torchbench  |            soft_actor_critic            |         0.9973         |  0.6066  |
| torchbench  |             LearningToPaint             |         0.7463         |  0.5925  |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.6004         |  0.5904  |
| torchbench  |                resnet18                 |         0.6097         |  0.5395  |
| torchbench  |              lennard_jones              |         0.9997         |  0.5317  |
| torchbench  |               hf_Reformer               |         0.8022         |  0.4538  |
| torchbench  |          functorch_dp_cifar10           |         0.4424         |  0.3991  |
| torchbench  |             phlippe_resnet              |         0.3395         |  0.3169  |
| huggingface |           ElectraForCausalLM            |         0.9739         |  0.8941  |
| huggingface |           PegasusForCausalLM            |         0.9864         |  0.893   |
| huggingface |          DistilBertForMaskedLM          |         0.9624         |  0.8849  |
| huggingface |            TrOCRForCausalLM             |         0.9583         |  0.8836  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9803         |  0.8729  |
| huggingface |     PegasusForConditionalGeneration     |         1.0689         |  0.8689  |
| huggingface |      MBartForConditionalGeneration      |         1.0307         |  0.8574  |
| huggingface |      BartForConditionalGeneration       |         1.0139         |  0.8456  |
| huggingface |         MegatronBertForCausalLM         |         1.0962         |  0.845   |
| huggingface |       BlenderbotSmallForCausalLM        |         0.9119         |  0.8184  |
| huggingface |         Speech2Text2ForCausalLM         |         0.8779         |  0.789   |
| huggingface |     M2M100ForConditionalGeneration      |         0.9908         |  0.7651  |
| huggingface |          MobileBertForMaskedLM          |         1.016          |  0.7473  |
| huggingface |             XGLMForCausalLM             |         0.9792         |  0.7117  |
| huggingface |     MobileBertForQuestionAnswering      |         0.8392         |  0.6569  |
| huggingface |           DebertaForMaskedLM            |         0.9988         |  0.5646  |
| huggingface |          DebertaV2ForMaskedLM           |         0.9664         |  0.5187  |
| huggingface |       DebertaForQuestionAnswering       |         1.1525         |  0.4867  |
| huggingface |      DebertaV2ForQuestionAnswering      |         0.9799         |  0.4855  |
| timm_models |                hrnet_w18                |          0.99          |  0.8918  |
| timm_models |            sebotnet33ts_256             |         1.1115         |  0.891   |
| timm_models |            adv_inception_v3             |         1.0171         |  0.8904  |
| timm_models |           gluon_inception_v3            |         1.0171         |  0.8904  |
| timm_models |              inception_v3               |         1.0171         |  0.8904  |
| timm_models |                 dpn107                  |         0.9642         |  0.8833  |
| timm_models |            gluon_xception65             |         0.9705         |  0.8831  |
| timm_models |              ghostnet_100               |         0.977          |  0.8807  |
| timm_models |              spnasnet_100               |         0.9451         |  0.8786  |
| timm_models |          mobilenetv3_large_100          |         0.9362         |  0.877   |
| timm_models |             poolformer_m36              |         1.1871         |  0.8768  |
| timm_models |           eca_botnext26ts_256           |         1.0072         |  0.8738  |
| timm_models |            res2net50_14w_8s             |         0.9607         |  0.8712  |
| timm_models |            res2net101_26w_4s            |         0.9483         |  0.871   |
| timm_models |                mixnet_l                 |         0.9902         |  0.8687  |
| timm_models |               mnasnet_100               |         0.9403         |  0.8683  |
| timm_models |               res2next50                |         0.9547         |  0.866   |
| timm_models |              cait_m36_384               |         0.989          |  0.8632  |
| timm_models |               fbnetc_100                |         0.9535         |  0.8596  |
| timm_models |                pit_b_224                |         1.0242         |  0.8578  |
| timm_models |               selecsls42b               |         0.9664         |  0.8576  |
| timm_models |              convnext_base              |         1.0338         |  0.8505  |
| timm_models |                gernet_l                 |         0.9706         |  0.8499  |
| timm_models |         swsl_resnext101_32x16d          |         0.9786         |  0.8461  |
| timm_models |             coat_lite_mini              |         1.0202         |  0.8402  |
| timm_models |              botnet26t_256              |         0.9779         |  0.8239  |
| timm_models |          xcit_large_24_p8_224           |         0.9732         |  0.8225  |
| timm_models |                lcnet_050                |         0.884          |  0.805   |
| timm_models |                repvgg_a2                |         0.9611         |  0.7738  |
| timm_models |               regnety_002               |         0.8966         |  0.7602  |
| timm_models |             crossvit_9_240              |         0.9898         |  0.7526  |
| timm_models |      swin_base_patch4_window7_224       |         0.9045         |  0.7214  |
| timm_models |              jx_nest_base               |         0.9604         |  0.6693  |
+-------------+-----------------------------------------+------------------------+----------+

Metrics over time

see more

bench_logs/geomean_over_time.png :

bench_logs/comp_time_over_time.png :

bench_logs/memory_over_time.png :

bench_logs/passrate_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Performance speedup regressions

+------------------------+-------------+-------------+------------+
|        compiler        |    name     | prev_status | cur_status |
+------------------------+-------------+-------------+------------+
| inductor_no_cudagraphs | timm_vovnet |   0.9668    |   0.9069   |
|        inductor        | timm_vovnet |    0.967    |   0.9051   |
+------------------------+-------------+-------------+------------+

Peak Memory Compression Ratio regressions

+----------+-------------------+-------------+------------+
| compiler |       name        | prev_status | cur_status |
+----------+-------------------+-------------+------------+
| inductor | timm_efficientnet |   0.9282    |   0.8699   |
+----------+-------------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

No regressions found.

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274

No regressions found.

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9697 |   0.919   |  3.6582  |         1.3912         |
|           BERT_pytorch            |  16  | 1.0024 |  0.8148   |  3.1767  |         2.1761         |
|            densenet121            |  4   | 0.9956 |  0.7141   |  2.7375  |         1.067          |
|            hf_BigBird             |  2   | 0.9564 |  0.7771   |  2.6305  |         1.6947         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9661 |  0.8955   |   2.5    |         1.8742         |
|             hf_Albert             |  8   | 0.9961 |  0.9591   |  2.3997  |         2.3026         |
|            hf_T5_large            |  2   | 1.009  |  0.8346   |  2.3498  |         2.0029         |
|         phlippe_densenet          | 128  | 0.9971 |  0.7717   |  2.0664  |         1.0236         |
|        mobilenet_v3_large         |  32  | 1.0011 |  0.7792   |  2.0447  |         1.2196         |
|           squeezenet1_1           |  32  | 0.9811 |  0.9061   |  1.978   |         1.3479         |
|               hf_T5               |  8   | 0.9955 |  0.8569   |  1.9512  |         2.0246         |
|               dlrm                | 1024 | 0.9456 |  0.8452   |  1.9424  |         1.2086         |
|          pytorch_struct           | 200  | 0.9204 |   0.771   |  1.9237  |         1.1358         |
|              hf_GPT2              |  4   | 1.0144 |  0.9819   |  1.8684  |         1.9004         |
|              hf_Bert              |  4   | 1.0304 |  0.8535   |  1.8596  |         1.7326         |
|          phlippe_resnet           | 128  | 0.9907 |  0.7615   |  1.8156  |         1.0133         |
|           hf_GPT2_large           |  4   | 1.0001 |  0.9885   |  1.7272  |         1.7914         |
|          resnext50_32x4d          |  8   | 0.9865 |  0.7163   |  1.6773  |         0.984          |
|        speech_transformer         |  32  | 0.9888 |  0.8267   |  1.6632  |         1.6573         |
|            mnasnet1_0             |  32  | 0.9972 |  0.7271   |  1.6322  |         1.0934         |
|        shufflenet_v2_x1_0         | 128  | 0.9976 |  0.7487   |  1.6252  |         1.2054         |
|           hf_Bert_large           |  4   | 1.0332 |  0.9033   |  1.6157  |         1.6439         |
| attention_is_all_you_need_pytorch | 256  | 1.0051 |  0.9184   |  1.6156  |         1.5154         |
|              hf_Bart              |  4   | 0.9966 |  0.8339   |  1.6129  |         1.6024         |
|             resnet18              |  16  | 0.9951 |   0.759   |  1.5827  |         0.9575         |
|           timm_resnest            |  32  | 0.9974 |  0.8517   |  1.5738  |         1.5279         |
|            timm_nfnet             | 128  | 0.9997 |   0.998   |  1.5653  |         1.5063         |
|      timm_vision_transformer      |  32  | 0.9952 |  0.8914   |  1.5558  |         1.4287         |
|           fastNLP_Bert            |  6   | 0.9942 |  0.8286   |  1.5482  |         1.5576         |
|           mobilenet_v2            |  96  | 0.9987 |  0.7767   |  1.5214  |         1.4906         |
|                drq                |  1   | 0.9504 |  0.7475   |  1.4937  |         1.0294         |
|           hf_DistilBert           |  8   | 0.9915 |  0.9666   |  1.4891  |         1.5028         |
|           lennard_jones           | 1000 | 0.8401 |  0.7313   |  1.4113  |         0.8334         |
|         timm_efficientnet         |  32  | 0.9152 |  0.6106   |  1.3734  |         1.0629         |
|           pytorch_unet            |  1   | 0.999  |  0.2048   |  1.3558  |         1.3557         |
|          LearningToPaint          |  96  | 0.9943 |  0.7708   |  1.3064  |         1.0745         |
|          pytorch_stargan          |  16  | 0.9876 |   0.798   |  1.3031  |         1.2329         |
|            Super_SloMo            |  6   | 0.9983 |  0.1788   |  1.2534  |         1.2335         |
|               dcgan               |  32  | 0.8569 |  0.6854   |  1.2424  |         0.8203         |
|               vgg16               |  64  | 0.9999 |  0.9983   |  1.2402  |         1.2543         |
|        Background_Matting         |  4   | 0.9993 |  0.1368   |  1.2134  |         1.2054         |
|             resnet152             |  32  | 1.0001 |  0.7573   |  1.196   |         1.0415         |
|              yolov3               |  16  |  1.0   |  0.8069   |  1.1956  |         1.197          |
|             resnet50              |  32  | 0.9994 |  0.7691   |  1.1755  |         1.0689         |
|         soft_actor_critic         | 256  | 0.8561 |  0.6314   |  1.1606  |         0.8239         |
|            hf_Reformer            |  4   | 0.9857 |  0.9681   |  1.1497  |         1.0725         |
|              alexnet              | 128  | 0.9989 |  0.9966   |  1.0893  |         1.1359         |
|              demucs               |  4   | 0.9988 |  1.0021   |  1.036   |         1.0389         |
|            timm_regnet            |  32  | 0.9148 |  0.7668   |  1.011   |         0.9575         |
|            tts_angular            |  64  | 0.9124 |  0.8868   |  0.9558  |         0.9621         |
|            timm_vovnet            |  32  | 0.8255 |  0.6948   |  0.9051  |         0.9069         |
|      nvidia_deeprecommender       | 256  | 0.9988 |  0.9985   |  0.8719  |         1.0181         |
|   timm_vision_transformer_large   |  32  | 1.0001 |    0.0    |   0.0    |         1.0835         |
|           hf_Longformer           |  2   | 1.0186 |  0.6901   |   0.0    |          0.0           |
|               moco                |  32  | 0.977  |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|          vision_maskrcnn          |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 25.9463 |  53.154   | 172.7875 |        172.6101        |
|         phlippe_densenet          | 128  |  3.265  |  7.0164   | 167.5088 |        167.556         |
|            hf_BigBird             |  2   | 12.5935 |  36.1014  | 152.3193 |        129.709         |
|         timm_efficientnet         |  32  | 5.2957  |  10.3142  | 145.1222 |        142.4561        |
|            densenet121            |  4   | 7.8054  |  17.9374  | 138.4377 |        135.0357        |
|        mobilenet_v3_large         |  32  | 3.4285  |  7.6075   | 134.8303 |        138.2815        |
|           mobilenet_v2            |  96  | 3.1554  |  6.9457   | 132.6283 |        131.4654        |
|              yolov3               |  16  | 4.8715  |  10.6588  | 121.1123 |        117.8476        |
|            mnasnet1_0             |  32  | 3.1536  |   6.792   | 107.0279 |        111.8789        |
|             resnet152             |  32  |  9.102  |  19.7779  | 105.3864 |        104.8387        |
|           hf_GPT2_large           |  4   | 14.3323 |  28.9845  | 104.3024 |        104.5714        |
|           timm_resnest            |  32  | 1.7988  |  3.8558   | 100.3741 |        100.9767        |
|        shufflenet_v2_x1_0         | 128  | 3.4981  |  7.6857   | 82.8824  |        81.7942         |
|        speech_transformer         |  32  |  5.89   |  13.6058  | 77.6017  |        77.5188         |
| attention_is_all_you_need_pytorch | 256  |  4.33   |  10.8089  | 74.8773  |        73.9756         |
|            timm_regnet            |  32  | 6.9701  |  12.5069  | 73.6615  |        72.6773         |
|        Background_Matting         |  4   | 3.1143  |  11.2955  | 72.3503  |        71.1221         |
|            timm_nfnet             | 128  | 5.7291  |  10.9371  |  72.237  |        72.2364         |
|           BERT_pytorch            |  16  | 4.8772  |  11.4222  | 69.2245  |         70.663         |
|             resnet50              |  32  | 3.1984  |  6.9712   | 67.2773  |        66.1567         |
|           hf_Bert_large           |  4   | 10.3053 |  21.0585  | 64.0271  |        63.4161         |
|            timm_vovnet            |  32  | 3.7546  |  6.6845   | 63.4038  |        61.6769         |
|           pytorch_unet            |  1   | 1.5135  |  4.3478   | 59.4011  |        59.5166         |
|       functorch_dp_cifar10        |  64  |  1.201  |  2.3804   | 56.9658  |        54.7266         |
|          resnext50_32x4d          |  8   |  3.196  |  6.9466   | 54.2688  |        53.4172         |
|           fastNLP_Bert            |  6   |  5.105  |  11.1377  | 52.0669  |        50.9107         |
|      timm_vision_transformer      |  32  | 3.2176  |   7.185   | 50.7134  |        49.6215         |
|              hf_Bart              |  4   |  6.251  |  13.4099  | 49.3824  |        50.0051         |
|               hf_T5               |  8   | 5.4275  |  12.2363  | 48.8081  |        48.1973         |
|          pytorch_stargan          |  16  | 1.2169  |   3.224   | 46.6197  |        46.8092         |
|             resnet18              |  16  |  1.345  |   2.753   | 45.5654  |        43.6735         |
|          LearningToPaint          |  96  | 1.4157  |  2.8923   | 44.9473  |         42.578         |
|            Super_SloMo            |  6   | 2.8698  |  9.6363   | 44.9174  |        43.0645         |
|            hf_Reformer            |  4   | 3.9811  |  5.8395   |  43.434  |        38.9483         |
|              hf_GPT2              |  4   | 4.4826  |  9.4417   | 43.0145  |        42.6021         |
|             hf_Albert             |  8   | 2.3826  |  8.2393   | 40.4799  |        38.1524         |
|              hf_Bert              |  4   | 5.1688  |  10.5666  | 38.4654  |        38.2089         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2277  |  2.9203   | 38.0783  |        35.3305         |
|          phlippe_resnet           | 128  | 1.3324  |   2.838   | 31.0755  |         32.852         |
|              demucs               |  4   | 1.4281  |  2.1409   | 31.0169  |        29.6396         |
|           hf_DistilBert           |  8   | 2.4733  |  5.1167   | 29.5731  |        29.7269         |
|           squeezenet1_1           |  32  | 1.0327  |  1.7548   | 25.9472  |        23.4227         |
|          pytorch_struct           | 200  | 0.7335  |  1.3242   | 21.1488  |        21.4178         |
|              alexnet              | 128  |  0.488  |  0.7695   | 16.5693  |        15.7423         |
|               vgg16               |  64  | 0.6279  |  1.1149   |  15.631  |        15.6345         |
|      nvidia_deeprecommender       | 256  | 0.4789  |  0.7538   | 10.4201  |         9.9907         |
|                drq                |  1   |  0.655  |  0.9879   |  9.4026  |        11.3608         |
|               dcgan               |  32  | 0.4276  |  0.7137   |  8.393   |         7.7578         |
|         soft_actor_critic         | 256  | 0.4219  |  0.5985   |  7.9325  |         7.2457         |
|               dlrm                | 1024 | 0.3659  |  0.7679   |  7.7438  |         7.627          |
|           lennard_jones           | 1000 | 0.3867  |  0.5879   |  6.8165  |         7.5705         |
|            tts_angular            |  64  | 0.4441  |  0.5047   |  6.1092  |         5.9665         |
|   timm_vision_transformer_large   |  32  | 9.3781  |    nan    |   nan    |        123.9497        |
|           hf_Longformer           |  2   | 9.3109  |  30.0282  |   nan    |          nan           |
|               moco                |  32  |  34.08  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0378  |         1.2557         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9861 |  0.7653   |  1.0109  |         1.1027         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9895  |         0.9983         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|            timm_nfnet             | 128  | 0.9071 |  0.8753   |  0.9685  |         1.0727         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  0.9575  |         1.1593         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9321  |         1.0713         |
|              yolov3               |  16  | 0.9882 |  0.8288   |  0.8922  |         1.0155         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8904  |         1.128          |
|         timm_efficientnet         |  32  | 0.9881 |  0.7663   |  0.8699  |         1.006          |
|        speech_transformer         |  32  | 0.9914 |   0.901   |  0.8651  |         0.8682         |
|           timm_resnest            |  32  | 0.9891 |  0.8984   |  0.8621  |         0.9523         |
|        shufflenet_v2_x1_0         | 128  | 0.9553 |  0.8382   |  0.8614  |         0.958          |
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  0.8614  |         1.208          |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|             resnet152             |  32  | 0.9939 |  0.8942   |  0.8489  |         0.9396         |
|            timm_regnet            |  32  | 0.991  |  0.8496   |  0.8486  |         0.9514         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  0.8485  |         1.0406         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9945         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.8411  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.8302  |         1.0725         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|              hf_Bart              |  4   | 0.9078 |  0.7516   |  0.793   |         0.9166         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|             resnet50              |  32  | 0.9902 |  0.8619   |  0.7821  |         0.8851         |
|              demucs               |  4   | 0.9658 |  0.9666   |  0.7731  |         0.9655         |
|           squeezenet1_1           |  32  | 0.9695 |  0.9321   |  0.7722  |         0.9074         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|            mnasnet1_0             |  32  | 0.9757 |  0.8971   |  0.7448  |         0.7758         |
|        mobilenet_v3_large         |  32  | 0.9803 |  0.8396   |  0.7274  |         0.7757         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|            densenet121            |  4   | 0.994  |  0.9818   |  0.7096  |         0.8059         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.7091  |         0.939          |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.697   |         0.7362         |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |  0.6947  |         1.1191         |
|          resnext50_32x4d          |  8   | 0.9962 |  0.8461   |  0.666   |         0.7738         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9966 |  0.8594   |  0.5904  |         0.6004         |
|             resnet18              |  16  | 0.9753 |  0.7786   |  0.5395  |         0.6097         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |  0.8932   |   nan    |          nan           |
|               moco                |  32  |  1.0   |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 209.3249 | 211.4389  | 120.8867 |        116.7843        |
|        Background_Matting         |  4   | 125.6574 |  918.896  | 103.5841 |        104.5493        |
|            hf_T5_large            |  2   | 219.1319 | 263.1176  | 93.9258  |        111.4011        |
|               hf_T5               |  8   | 180.2521 |  209.283  |  91.891  |        88.7354         |
|            hf_BigBird             |  2   | 196.1605 | 250.0798  | 75.8767  |        114.6216        |
|            timm_nfnet             | 128  | 118.524  | 118.5059  | 75.4465  |        78.4533         |
|            hf_Reformer            |  4   | 82.0158  |  83.7234  | 70.4701  |         75.533         |
|            Super_SloMo            |  6   | 79.4621  | 443.8308  | 63.3414  |        64.2655         |
|              yolov3               |  16  | 68.6862  |  84.7296  |  57.292  |        57.3326         |
|            timm_regnet            |  32  | 61.5683  |   72.49   | 55.4775  |        57.9953         |
|               vgg16               |  64  | 66.2083  |  66.2347  | 53.4173  |        52.8239         |
|             resnet152             |  32  | 64.3139  |  83.9883  | 52.6848  |        63.6932         |
|              demucs               |  4   | 53.6209  |  53.7761  | 51.6864  |        51.4884         |
|           hf_Bert_large           |  4   | 80.4343  |  91.0669  | 50.7201  |         50.128         |
| attention_is_all_you_need_pytorch | 256  | 54.7484  |  59.421   | 35.4072  |        35.5664         |
|           fastNLP_Bert            |  6   | 51.7386  |  71.6588  | 34.1793  |        33.9661         |
|              hf_Bart              |  4   | 55.1964  |  65.1793  | 34.0717  |         37.278         |
|        speech_transformer         |  32  |  63.245  |  74.8508  | 33.9012  |        33.5625         |
|           mobilenet_v2            |  96  | 46.9563  |  60.3982  | 30.8559  |        31.5191         |
|           pytorch_unet            |  1   | 39.8634  |  194.023  | 29.3627  |        29.3148         |
|             hf_Albert             |  8   |  68.445  |  72.428   | 29.0579  |         29.662         |
|            timm_vovnet            |  32  |  29.528  |  35.6378  | 26.9574  |        27.2383         |
|              hf_GPT2              |  4   | 47.5806  |  49.7688  |  25.964  |        25.6693         |
|         timm_efficientnet         |  32  | 34.7141  |  52.0455  | 23.1773  |        30.3247         |
|             resnet50              |  32  | 26.2785  |  34.1875  |  22.218  |        24.3671         |
|              hf_Bert              |  4   |  39.779  |  46.597   |  21.716  |        23.3221         |
|           hf_DistilBert           |  8   | 31.6842  |  32.4612  | 21.0409  |        20.8771         |
|            densenet121            |  4   | 54.2773  |  89.4171  | 19.8993  |        51.7179         |
|        shufflenet_v2_x1_0         | 128  | 31.8489  |  40.7839  | 18.7007  |        25.4649         |
|      timm_vision_transformer      |  32  |  27.641  |  39.5384  |  17.926  |        19.8341         |
|           BERT_pytorch            |  16  | 54.2927  |  66.6315  | 17.0045  |        26.1298         |
|           timm_resnest            |  32  | 24.1265  |  28.1557  |  15.309  |        15.7671         |
|            mnasnet1_0             |  32  | 22.3975  |  30.5315  | 14.1937  |         19.735         |
|        mobilenet_v3_large         |  32  | 26.5711  |  34.3428  |  12.801  |         21.652         |
|          resnext50_32x4d          |  8   | 20.5796  |  28.176   | 11.9147  |        21.0385         |
|          pytorch_stargan          |  16  | 15.3719  |  18.6159  |  11.721  |        11.8593         |
|      nvidia_deeprecommender       | 256  | 10.2335  |  10.2274  |  11.702  |        10.0291         |
|         phlippe_densenet          | 128  | 23.2716  |  29.9093  | 11.4237  |         23.003         |
|              alexnet              | 128  |  9.8206  |  9.8468   |   9.02   |         8.6277         |
|          LearningToPaint          |  96  | 11.1943  |  14.6946  |  8.5715  |        10.5665         |
|            tts_angular            |  64  |  6.8506  |  7.0346   |  6.4979  |         6.494          |
|             resnet18              |  16  |  9.2408  |  12.0879  |  5.8413  |        10.0399         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.4739  |  15.5814  |  5.7036  |         8.6486         |
|           squeezenet1_1           |  32  | 10.3933  |  12.9939  |  5.4228  |         7.4575         |
|          phlippe_resnet           | 128  |  8.9327  |  11.6171  |  4.9345  |         8.9538         |
|       functorch_dp_cifar10        |  64  | 10.3072  |  11.0141  |  2.7617  |         7.1939         |
|               dcgan               |  32  |  2.3919  |  3.0484   |  2.4878  |         2.5626         |
|          pytorch_struct           | 200  |  4.989   |  6.0406   |  2.4188  |         4.1867         |
|               dlrm                | 1024 |  4.881   |  4.8991   |  2.1422  |         3.4454         |
|                drq                |  1   |  3.5038  |  4.2707   |  2.142   |         3.7562         |
|         soft_actor_critic         | 256  |  1.993   |  2.4874   |  1.3575  |         1.9303         |
|           lennard_jones           | 1000 |  1.8183  |  2.1103   |  1.1104  |         2.9801         |
|   timm_vision_transformer_large   |  32  | 464.9349 |    nan    |   nan    |        427.5826        |
|           hf_Longformer           |  2   | 110.3092 | 164.0933  |   nan    |          nan           |
|               moco                |  32  | 51.2652  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 1.0124 |  0.8562   |  2.8018  |         1.1885         |
|     MobileBertForQuestionAnswering      | 128 | 1.015  |  0.8787   |  2.5779  |         1.1849         |
|      GPT2ForSequenceClassification      |  4  | 0.9884 |  0.9641   |  2.3388  |         2.3578         |
|             OPTForCausalLM              |  2  | 0.9928 |   0.931   |  2.2852  |         2.3108         |
|       MT5ForConditionalGeneration       | 16  | 1.0152 |  0.8489   |  2.2276  |         1.9733         |
|       ElectraForQuestionAnswering       | 64  | 0.9976 |  0.9917   |  2.1788  |         2.1633         |
|           ElectraForCausalLM            | 32  | 0.9962 |  0.9498   |  1.8426  |         1.8715         |
|    LayoutLMForSequenceClassification    | 16  | 0.9967 |  0.9837   |  1.8314  |         1.8342         |
|            XLNetLMHeadModel             |  8  | 0.9973 |  0.9731   |  1.8228  |         1.8272         |
|        BertForQuestionAnswering         | 16  | 0.9976 |  0.9824   |  1.8054  |         1.8043         |
|       RobertaForQuestionAnswering       | 16  | 0.9976 |  0.9819   |  1.8002  |         1.8089         |
|           RobertaForCausalLM            | 16  | 0.9977 |  0.9729   |  1.6833  |         1.6995         |
|               DistillGPT2               | 16  | 0.9931 |  0.9603   |  1.6751  |         1.7187         |
|                 T5Small                 |  4  | 0.993  |  0.8692   |  1.6659  |         1.7709         |
|       T5ForConditionalGeneration        |  4  | 0.9942 |  0.8667   |  1.6604  |         1.7728         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9971 |  0.9783   |  1.6529  |         1.6785         |
|       AlbertForQuestionAnswering        |  4  | 1.0003 |  0.8856   |  1.6468  |         1.6488         |
|            AlbertForMaskedLM            |  4  | 1.0002 |  0.8849   |  1.6365  |         1.6423         |
|           LayoutLMForMaskedLM           | 16  | 0.9976 |  0.9732   |  1.6111  |         1.6163         |
|             XGLMForCausalLM             |  8  | 1.0057 |  0.8388   |  1.6094  |         1.5117         |
|     PLBartForConditionalGeneration      |  4  | 0.9921 |  0.9572   |  1.5982  |         1.6228         |
|             BertForMaskedLM             | 16  | 0.9975 |  0.9728   |  1.5955  |         1.6169         |
|                CamemBert                | 16  | 0.998  |  0.9735   |  1.5485  |         1.5616         |
|         MegatronBertForCausalLM         |  4  | 1.0091 |  0.9424   |  1.5342  |         1.5775         |
|            YituTechConvBert             | 16  | 0.9981 |  0.9697   |  1.523   |         1.5225         |
|            PLBartForCausalLM            |  8  | 0.9923 |  0.9596   |  1.4739  |         1.5041         |
|             BartForCausalLM             |  4  | 0.9896 |  0.9601   |  1.4672  |         1.5001         |
|            MBartForCausalLM             |  4  | 0.991  |  0.9617   |  1.4614  |         1.4932         |
|     DistilBertForQuestionAnswering      | 256 | 0.9972 |  0.9883   |  1.4547  |         1.4617         |
|      BartForConditionalGeneration       |  2  | 1.0024 |  0.9836   |  1.4515  |         1.4745         |
|      MBartForConditionalGeneration      |  2  | 1.0016 |  0.9846   |  1.4388  |         1.4642         |
|     M2M100ForConditionalGeneration      | 16  | 0.9975 |  0.8423   |  1.4235  |         1.5283         |
|         Speech2Text2ForCausalLM         | 256 | 0.9852 |  0.9282   |  1.422   |         1.4566         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0045 |  0.9269   |  1.3459  |         1.4611         |
|     PegasusForConditionalGeneration     | 32  | 1.0012 |  0.9305   |  1.2491  |         1.282          |
|            TrOCRForCausalLM             | 32  | 0.9918 |  0.9586   |  1.2449  |         1.273          |
|       BlenderbotSmallForCausalLM        | 64  | 0.9862 |  0.9188   |  1.2156  |         1.2765         |
|          DistilBertForMaskedLM          | 128 | 0.9964 |  0.9528   |  1.2151  |         1.2404         |
|       DebertaForQuestionAnswering       |  8  | 0.8439 |  0.7341   |  1.2017  |         1.0283         |
|           PegasusForCausalLM            | 32  | 0.9919 |  0.9316   |  1.1748  |         1.2089         |
|           DebertaForMaskedLM            |  4  | 0.7465 |  0.5928   |  1.1071  |         0.9155         |
|          DebertaV2ForMaskedLM           |  1  | 0.7424 |   0.548   |  0.9915  |         0.7633         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7276 |  0.5606   |  0.9365  |         0.7649         |
|          BlenderbotForCausalLM          |  4  | 1.0128 |  0.8789   |   0.0    |         1.1113         |
|          AllenaiLongformerBase          |  4  | 1.0088 |  0.6698   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 17.5807 |  41.4381  | 144.6304 |        143.5481        |
|      DebertaV2ForQuestionAnswering      |  2  | 15.151  |  28.132   | 143.9355 |        75.0211         |
|          DebertaV2ForMaskedLM           |  1  | 15.4266 |  26.9838  | 142.4583 |        72.6957         |
|     MobileBertForQuestionAnswering      | 128 | 17.501  |  41.3492  | 140.5316 |        137.7172        |
|     M2M100ForConditionalGeneration      | 16  | 12.5394 |  26.0907  | 135.7955 |        137.0814        |
|       MT5ForConditionalGeneration       | 16  | 8.0883  |  18.9056  | 131.704  |        131.0958        |
|             XGLMForCausalLM             |  8  | 9.1761  |  20.3883  | 121.2938 |        120.5425        |
|            XLNetLMHeadModel             |  8  | 10.4474 |  27.1093  | 92.3332  |        93.1655         |
|           DebertaForMaskedLM            |  4  | 7.3549  |  13.2319  | 89.3859  |        55.2328         |
|       DebertaForQuestionAnswering       |  8  | 7.1188  |  13.4679  | 83.8818  |        58.8526         |
|      MBartForConditionalGeneration      |  2  | 11.8479 |  25.7167  | 79.8837  |        79.2895         |
|      BartForConditionalGeneration       |  2  | 11.8071 |  25.7102  | 75.1983  |        74.7071         |
|     PegasusForConditionalGeneration     | 32  | 5.2282  |  18.9429  |  68.579  |        66.7094         |
|    MegatronBertForQuestionAnswering     |  8  | 10.4718 |  20.9846  | 68.1261  |        66.4133         |
|            YituTechConvBert             | 16  | 7.1301  |  16.3699  | 67.5943  |        67.2106         |
|         MegatronBertForCausalLM         |  4  | 10.5614 |  21.0139  | 65.9607  |        66.3116         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7716  |  16.6903  | 55.7454  |        55.8838         |
|           ElectraForCausalLM            | 32  | 5.4646  |  11.2625  | 52.3974  |        52.7041         |
|       T5ForConditionalGeneration        |  4  | 5.4555  |  13.064   | 49.7481  |        49.7064         |
|                 T5Small                 |  4  | 5.4194  |  13.3473  | 49.7369  |        49.5099         |
|     PLBartForConditionalGeneration      |  4  | 6.1029  |  13.3983  |  49.096  |         47.843         |
|    LayoutLMForSequenceClassification    | 16  | 5.7143  |  10.9165  | 45.5621  |        45.5169         |
|       ElectraForQuestionAnswering       | 64  | 5.4218  |  11.2869  | 43.4151  |        45.9675         |
|        BertForQuestionAnswering         | 16  | 5.1198  |  10.5429  |  40.26   |        39.6086         |
|           LayoutLMForMaskedLM           | 16  | 5.7766  |  11.6045  | 40.2494  |        40.3316         |
|            MBartForCausalLM             |  4  | 5.6768  |  10.9204  | 39.2479  |        39.1769         |
|             BertForMaskedLM             | 16  |  5.135  |  10.6208  | 39.0153  |        40.2094         |
|             OPTForCausalLM              |  2  | 4.8841  |  10.509   | 38.9629  |        37.0772         |
|            AlbertForMaskedLM            |  4  | 2.3234  |   7.895   | 37.7563  |        37.9077         |
|             BartForCausalLM             |  4  | 5.5447  |  10.6577  | 37.7408  |        38.2766         |
|      GPT2ForSequenceClassification      |  4  | 4.7042  |  9.5765   | 37.3641  |        36.5546         |
|                CamemBert                | 16  |  5.186  |  11.1653  | 36.7729  |        38.4354         |
|           PegasusForCausalLM            | 32  | 5.7773  |  10.779   | 36.7311  |        36.9207         |
|           RobertaForCausalLM            | 16  | 5.1995  |  11.2112  | 36.7023  |        36.5579         |
|            TrOCRForCausalLM             | 32  | 5.5379  |  10.8431  |  36.39   |        36.7102         |
|       RobertaForQuestionAnswering       | 16  | 5.1839  |  11.1271  | 35.8238  |        35.4157         |
|     DistilBertForQuestionAnswering      | 256 | 2.4335  |  5.6206   | 34.8894  |        35.9846         |
|       AlbertForQuestionAnswering        |  4  | 2.1511  |  7.8276   | 33.5896  |        34.4737         |
|          DistilBertForMaskedLM          | 128 | 2.4756  |  5.6246   | 33.1256  |        34.3366         |
|       BlenderbotSmallForCausalLM        | 64  | 3.7929  |  7.2109   | 30.0429  |        29.6925         |
|               DistillGPT2               | 16  | 2.4664  |   4.973   | 29.5739  |        27.7673         |
|            PLBartForCausalLM            |  8  | 3.1262  |  6.0776   | 27.0922  |        24.8802         |
|         Speech2Text2ForCausalLM         | 256 | 2.8833  |  5.5883   | 25.0506  |        24.2392         |
|          BlenderbotForCausalLM          |  4  | 10.8105 |  22.3565  |   nan    |        70.0582         |
|          AllenaiLongformerBase          |  4  |  9.461  |  30.477   |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1376  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|             OPTForCausalLM              |  2  |  1.0   |  0.9164   |  1.094   |         1.1343         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0607  |         1.1729         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0603  |         1.1724         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0077  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0075  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0035  |         1.0491         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  0.9911  |         1.0411         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9729  |         1.3147         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |   0.93    |  0.9649  |         1.0521         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9501  |         1.268          |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9281  |         0.9912         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8679   |  0.914   |         0.9887         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9137  |         0.9749         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0018         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.8941  |         0.9739         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.893   |         0.9864         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8836  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |  0.9101   |  0.8689  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8988   |  0.8574  |         1.0307         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8184  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.789   |         0.8779         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7651  |         0.9908         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7117  |         0.9792         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9143   |  0.5646  |         0.9988         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5187  |         0.9664         |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0516   |  0.4867  |         1.1525         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9764   |  0.4855  |         0.9799         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8694   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 265.7808 | 300.5748  | 162.828  |        162.0204        |
|       AlbertForQuestionAnswering        |  4  | 263.6952 | 297.8401  | 160.5282 |        160.376         |
|            XLNetLMHeadModel             |  8  | 280.2064 | 288.0741  | 152.7673 |        154.4239        |
|      DebertaV2ForQuestionAnswering      |  2  | 145.9074 | 208.1777  | 113.191  |        153.797         |
|     PegasusForConditionalGeneration     | 32  | 137.1914 |  148.46   | 109.4894 |        106.8435        |
|            TrOCRForCausalLM             | 32  | 135.9798 | 140.6164  | 108.3982 |        105.8805        |
|          DebertaV2ForMaskedLM           |  1  | 156.986  | 188.8558  | 104.5793 |        152.9482        |
|      MBartForConditionalGeneration      |  2  | 134.9609 |  137.305  | 94.0046  |        92.2154         |
|      BartForConditionalGeneration       |  2  | 135.2007 | 137.1407  | 93.1315  |         91.432         |
|    MegatronBertForQuestionAnswering     |  8  | 142.0484 | 145.1134  | 85.8382  |        84.4761         |
|            YituTechConvBert             | 16  | 125.6051 | 129.3039  | 82.2194  |        82.1266         |
| BlenderbotSmallForConditionalGeneration | 64  | 108.3804 | 116.1221  | 80.0639  |        78.5302         |
|                CamemBert                | 16  | 118.4959 | 121.9079  | 76.3875  |        75.7299         |
|            MBartForCausalLM             |  4  | 109.8924 |  112.974  | 74.3129  |        72.7823         |
|     M2M100ForConditionalGeneration      | 16  | 117.2325 | 125.7384  | 74.1911  |        80.0359         |
|             BartForCausalLM             |  4  | 109.8719 | 113.2797  | 73.9888  |        72.4537         |
|     PLBartForConditionalGeneration      |  4  | 115.0913 | 119.2931  | 71.7503  |        70.3124         |
|     DistilBertForQuestionAnswering      | 256 | 103.3651 | 105.0726  | 71.2645  |        71.3808         |
|           LayoutLMForMaskedLM           | 16  | 113.0013 | 115.7561  | 69.7598  |        69.6012         |
|            PLBartForCausalLM            |  8  | 103.7819 |  107.231  | 69.6466  |        68.1501         |
|          DistilBertForMaskedLM          | 128 | 84.8034  |  89.495   | 69.5775  |        68.2835         |
|             BertForMaskedLM             | 16  | 110.247  | 112.8866  | 68.8882  |         68.058         |
|     MobileBertForQuestionAnswering      | 128 | 187.1383 | 229.1862  | 68.8746  |        140.5742        |
|           RobertaForCausalLM            | 16  | 115.4274 | 118.4557  | 68.3175  |        67.5849         |
|             OPTForCausalLM              |  2  | 157.3231 | 167.7063  | 68.2493  |        67.4247         |
|       DebertaForQuestionAnswering       |  8  | 89.8244  | 103.3227  | 63.0905  |        75.4324         |
|               DistillGPT2               | 16  | 106.3553 | 109.9289  | 63.0446  |        61.4645         |
|       T5ForConditionalGeneration        |  4  | 104.7576 | 122.7646  |  62.683  |        58.8875         |
|                 T5Small                 |  4  | 105.3804 | 123.2788  | 62.6384  |        58.7676         |
|          MobileBertForMaskedLM          | 64  | 191.7133 | 228.5332  | 60.7053  |        141.9753        |
|           PegasusForCausalLM            | 32  | 71.8295  |  73.3607  | 58.1036  |        56.5494         |
|           DebertaForMaskedLM            |  4  | 81.6686  | 101.4215  | 57.8015  |        66.7985         |
|         MegatronBertForCausalLM         |  4  | 86.0454  |  91.5058  | 56.6654  |        55.5669         |
|    LayoutLMForSequenceClassification    | 16  | 98.2202  |  99.3662  | 53.2697  |        53.2283         |
|             XGLMForCausalLM             |  8  | 84.7461  | 102.0089  | 53.2653  |        55.6737         |
|       RobertaForQuestionAnswering       | 16  |  95.868  |  97.4631  | 53.0532  |        52.7439         |
|        BertForQuestionAnswering         | 16  | 95.5178  |  96.9234  | 52.8649  |        52.7825         |
|       ElectraForQuestionAnswering       | 64  | 117.0991 | 117.5921  | 52.5893  |        54.0107         |
|           ElectraForCausalLM            | 32  | 88.8813  |  92.8561  | 47.7366  |        47.2803         |
|       BlenderbotSmallForCausalLM        | 64  | 57.2553  |  61.5071  | 46.5256  |         45.542         |
|       MT5ForConditionalGeneration       | 16  | 101.3757 | 120.6354  | 41.6699  |         47.073         |
|      GPT2ForSequenceClassification      |  4  | 92.6425  |  94.7901  | 39.5521  |        38.8663         |
|         Speech2Text2ForCausalLM         | 256 | 50.0929  |  53.1414  | 34.6434  |        33.7988         |
|          BlenderbotForCausalLM          |  4  | 89.9689  | 111.8168  |   nan    |        81.8176         |
|          AllenaiLongformerBase          |  4  | 179.8729 | 270.1123  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 1.0004 |  0.9987   |  3.0253  |         2.9735         |
|      xcit_large_24_p8_224       |  5  | 0.9932 |  0.8669   |  2.0458  |         1.6353         |
|        twins_pcpvt_base         | 64  | 1.0044 |   0.925   |  1.9538  |         1.7437         |
|         coat_lite_mini          | 128 | 0.9996 |   0.998   |  1.9519  |         1.9247         |
|          gmlp_s16_224           | 128 | 1.0002 |  1.0898   |  1.8702  |         1.8504         |
|          ghostnet_100           | 128 | 0.9983 |  0.7523   |  1.8591  |         1.6416         |
|          gmixer_24_224          | 128 | 0.9998 |  0.8931   |  1.7797  |         1.7654         |
|           volo_d1_224           | 64  | 0.9995 |   0.978   |  1.7067  |         1.6828         |
|         crossvit_9_240          | 128 | 1.0001 |  0.7887   |  1.669   |         1.6416         |
|            lcnet_050            | 128 | 0.9282 |  0.7235   |  1.6404  |         1.4145         |
|  swin_base_patch4_window7_224   | 64  | 0.9995 |  0.9606   |  1.6397  |         1.6302         |
|           convit_base           | 64  |  1.0   |  0.9997   |  1.617   |         1.6171         |
|       gluon_inception_v3        | 128 | 0.9998 |  0.8656   |  1.536   |         1.5253         |
|        adv_inception_v3         | 128 | 0.9995 |  0.8606   |  1.5336  |         1.5242         |
|             dla102              | 128 | 0.9996 |  0.8164   |  1.5322  |         1.5285         |
|          inception_v3           | 128 | 0.9997 |  0.8654   |  1.5314  |         1.5221         |
|          convnext_base          | 64  | 0.9999 |  1.0008   |  1.5248  |         1.507          |
|            nfnet_l0             | 128 | 0.9986 |  0.8206   |  1.5069  |         1.4579         |
|           dm_nfnet_f0           | 128 | 0.9997 |  0.9979   |  1.5042  |         1.4543         |
|        sebotnet33ts_256         | 64  | 0.9535 |  0.7607   |  1.4949  |         1.5207         |
|            pit_b_224            | 64  | 0.9994 |  0.9977   |  1.4451  |         1.4389         |
|       eca_botnext26ts_256       | 128 | 0.9703 |  0.7173   |  1.4316  |         1.4178         |
|           resnest101e           | 64  | 1.0001 |  0.8697   |  1.4286  |         1.358          |
|           mobilevit_s           | 64  | 0.9622 |  0.7298   |  1.4215  |         1.4387         |
|           selecsls42b           | 128 | 0.9991 |  0.8103   |  1.4089  |         1.4093         |
|          botnet26t_256          | 128 | 0.9697 |  0.8466   |  1.3974  |         1.4103         |
|          jx_nest_base           | 32  | 0.9997 |  0.9977   |  1.3915  |         1.3834         |
|      mobilenetv3_large_100      | 128 | 0.9367 |  0.7508   |  1.3904  |         1.4177         |
|           mnasnet_100           | 128 | 0.9327 |  0.7299   |  1.3902  |         1.4531         |
|           regnety_002           | 128 | 0.9451 |  0.7058   |  1.3898  |         1.2229         |
|        res2net50_14w_8s         | 128 | 0.9997 |  0.7879   |  1.3771  |         1.3504         |
|           res2next50            | 128 | 0.9995 |  0.8243   |  1.3677  |         1.3608         |
|          mixer_b16_224          | 128 | 0.9995 |  1.0208   |  1.3658  |         1.3657         |
|            hrnet_w18            | 128 | 0.9981 |  0.6437   |  1.3618  |         1.3464         |
|          cait_m36_384           |  4  | 1.0001 |  0.9987   |  1.3586  |         1.3566         |
|      beit_base_patch16_224      | 64  | 0.9994 |  0.9681   |  1.3568  |         1.3572         |
|         poolformer_m36          | 64  | 1.0002 |  0.9964   |  1.3533  |         1.3426         |
|         mobilenetv2_100         | 128 | 0.9349 |   0.727   |  1.3494  |         1.4077         |
|        ese_vovnet19b_dw         | 128 | 0.9537 |  0.8271   |  1.343   |         1.3591         |
|       tf_efficientnet_b0        | 128 | 0.9516 |  0.6755   |  1.3306  |         1.3598         |
|          spnasnet_100           | 128 | 0.9248 |  0.7264   |  1.3147  |         1.3764         |
|           fbnetc_100            | 128 | 0.9349 |  0.7286   |  1.2865  |         1.3681         |
|           rexnet_100            | 128 | 0.9476 |  0.6995   |  1.2839  |         1.3186         |
|            fbnetv3_b            | 128 | 0.936  |  0.7592   |  1.2826  |         1.2963         |
|          resmlp_12_224          | 128 |  1.0   |  0.8949   |  1.2743  |         1.268          |
| deit_base_distilled_patch16_224 | 64  | 0.9997 |  0.9969   |  1.2626  |         1.2614         |
|      vit_base_patch16_224       | 64  | 0.9995 |  0.9971   |  1.241   |         1.2396         |
|          cspdarknet53           | 64  | 0.9244 |  0.7779   |  1.2084  |         1.2421         |
|            tinynet_a            | 128 | 0.9351 |  0.6708   |  1.2059  |         1.187          |
|           tf_mixnet_l           | 128 | 0.9746 |  0.8242   |  1.1827  |         1.1853         |
|         visformer_small         | 128 | 0.9993 |  0.9462   |  1.1759  |         1.1688         |
|            mixnet_l             | 128 | 0.9734 |   0.818   |  1.1675  |         1.1765         |
|        res2net101_26w_4s        | 64  | 0.9997 |   0.793   |  1.1395  |         1.0918         |
|          pnasnet5large          | 16  | 0.9969 |  0.9146   |  1.1234  |         1.1393         |
|        gluon_xception65         | 32  | 0.9997 |  0.8439   |  1.0793  |         1.082          |
|            repvgg_a2            | 128 | 0.924  |  0.7468   |  1.0704  |         1.1021         |
|             dpn107              | 32  | 0.9223 |  0.7962   |  1.0669  |         1.1149         |
|     swsl_resnext101_32x16d      | 32  | 0.9993 |  0.8392   |  1.0574  |         1.0216         |
|            gernet_l             | 128 | 0.9245 |  0.7846   |  1.0241  |         1.0514         |
|        convmixer_768_32         | 32  | 0.9996 |  0.9649   |  1.0025  |         1.002          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.7352  |  11.2813  | 277.1841 |        277.7929        |
|            hrnet_w18            | 128 | 9.8806  |  36.0362  | 252.9675 |        242.5264        |
|          ghostnet_100           | 128 | 7.7875  |  15.0715  | 240.8263 |        239.6408        |
|            fbnetv3_b            | 128 | 8.8988  |  17.1943  | 173.1145 |        169.5089        |
|           mobilevit_s           | 64  | 5.4023  |  11.2676  | 169.4187 |        166.9193        |
|           resnest101e           | 64  | 10.9378 |  24.1698  | 166.0831 |        167.3397        |
|          pnasnet5large          | 16  | 8.2641  |  25.8835  | 163.7173 |        159.0844        |
|            tinynet_a            | 128 | 6.1992  |  12.398   | 160.5645 |        158.4462        |
|           tf_mixnet_l           | 128 | 9.5498  |  17.375   | 157.562  |        160.187         |
|            mixnet_l             | 128 | 9.0469  |  16.6753  | 156.9406 |        161.5003        |
|        adv_inception_v3         | 128 | 5.6818  |  12.5144  | 156.8093 |        157.2056        |
|      mobilenetv3_large_100      | 128 | 4.5535  |  8.5073   | 156.4856 |        162.1072        |
|       gluon_inception_v3        | 128 | 5.5716  |  12.4151  | 155.3081 |        159.6783        |
|        res2net101_26w_4s        | 64  | 10.7273 |  24.9753  | 153.4042 |         152.37         |
|          inception_v3           | 128 | 5.6541  |  12.4132  | 153.2347 |        160.432         |
|        twins_pcpvt_base         | 64  | 10.6884 |  23.0734  | 148.8185 |        147.3283        |
|       tf_efficientnet_b0        | 128 | 5.2798  |  10.4728  | 148.5953 |        149.7758        |
|           fbnetc_100            | 128 | 5.2947  |   9.789   | 136.2882 |        138.7556        |
|      xcit_large_24_p8_224       |  5  | 12.4502 |  27.8751  | 134.8512 |        132.4333        |
|          spnasnet_100           | 128 | 5.2924  |  9.6419   | 133.8117 |        135.3223        |
|         mobilenetv2_100         | 128 | 4.2772  |  7.9108   | 129.7049 |        129.8349        |
|           mnasnet_100           | 128 | 4.2783  |  7.6572   | 122.5684 |        120.9557        |
|        res2net50_14w_8s         | 128 | 8.9464  |  22.1449  | 122.2357 |        123.1389        |
|          cait_m36_384           |  4  | 13.6351 |  30.1142  | 115.5077 |        114.2012        |
|        sebotnet33ts_256         | 64  | 4.4197  |   9.035   | 107.611  |        108.578         |
|  swin_base_patch4_window7_224   | 64  | 8.4401  |  19.1094  | 107.5922 |        107.8043        |
|           regnety_002           | 128 | 5.0521  |  8.9941   | 106.0563 |        110.2139        |
|          cspdarknet53           | 64  | 6.0057  |  10.956   | 103.5339 |        100.8063        |
|         poolformer_m36          | 64  | 7.4969  |  13.5172  | 101.9719 |        100.0921        |
|       eca_botnext26ts_256       | 128 |  3.215  |    6.7    | 100.3052 |        95.3126         |
|             dpn107              | 32  | 10.588  |  20.0657  | 98.6899  |        97.8907         |
|        gluon_xception65         | 32  | 7.6447  |  16.966   |  97.573  |        95.9385         |
|             dla102              | 128 | 6.4223  |  13.9593  | 97.2737  |        97.7229         |
|            lcnet_050            | 128 | 2.5628  |  5.0639   | 95.4099  |        95.8891         |
|           selecsls42b           | 128 | 2.4767  |  5.4512   | 91.0029  |        89.7694         |
|          botnet26t_256          | 128 | 2.9475  |  5.9842   | 90.9388  |        92.3254         |
|         coat_lite_mini          | 128 |  3.291  |  7.7417   | 89.9134  |        89.9957         |
|         crossvit_9_240          | 128 | 5.7395  |  13.1852  | 88.0831  |        86.1793         |
|           res2next50            | 128 | 5.0511  |  11.9653  | 87.1474  |        86.2614         |
|            gernet_l             | 128 | 5.2275  |  9.2022   | 84.4485  |        80.0645         |
|          jx_nest_base           | 32  | 6.5241  |  14.5048  | 83.9076  |        83.0802         |
|            nfnet_l0             | 128 | 5.1652  |  10.5471  | 77.9805  |        78.3288         |
|        ese_vovnet19b_dw         | 128 | 2.6577  |  4.7654   | 76.7799  |        78.9475         |
|           volo_d1_224           | 64  |  5.002  |  11.4475  |  73.682  |        73.2934         |
|           dm_nfnet_f0           | 128 | 5.6836  |  11.3071  | 73.5957  |        73.0252         |
|        tnt_s_patch16_224        | 128 | 6.5011  |  15.9023  | 68.8747  |        71.4402         |
|         visformer_small         | 128 | 2.5787  |  5.9282   | 67.8744  |        66.9578         |
|            repvgg_a2            | 128 |  5.053  |   9.062   | 62.1269  |        61.8699         |
|     swsl_resnext101_32x16d      | 32  |  6.204  |  13.7837  | 62.0442  |        61.3435         |
|          gmlp_s16_224           | 128 | 5.5996  |  11.7266  | 59.7478  |        60.4449         |
|          convnext_base          | 64  | 6.7641  |  12.3524  | 59.2529  |        60.1011         |
|          gmixer_24_224          | 128 | 5.6444  |  12.6371  |  52.294  |        51.4179         |
|           convit_base           | 64  | 3.4472  |  8.3508   | 48.4399  |        48.7407         |
|            pit_b_224            | 64  | 3.3415  |  7.7997   | 44.2652  |        45.1831         |
|          resmlp_12_224          | 128 | 2.6922  |  5.3108   | 39.5546  |        39.2687         |
| deit_base_distilled_patch16_224 | 64  | 3.0566  |  6.9335   | 39.5143  |        42.8399         |
|      vit_base_patch16_224       | 64  | 3.0166  |  6.7644   | 39.0292  |        38.7852         |
|        convmixer_768_32         | 32  | 1.6487  |  6.8708   | 37.6778  |        36.7744         |
|      beit_base_patch16_224      | 64  | 3.9233  |  8.5374   | 37.4855  |         34.248         |
|          mixer_b16_224          | 128 | 2.6546  |  5.7593   | 33.2376  |        32.4935         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1848  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1117  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0079  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9899 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9635 |  0.9155   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.9501  |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9362         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9782 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8225  |         0.9732         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7779   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8277   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.3626 | 310.9328  | 299.3762 |        299.8827        |
|            hrnet_w18            | 128 | 280.0898 | 432.9165  | 204.6061 |        207.567         |
|          pnasnet5large          | 16  | 196.3375 | 213.8233  | 174.3661 |        172.3967        |
|           tf_mixnet_l           | 128 | 194.1831 | 229.9697  | 160.205  |        159.9924        |
|            mixnet_l             | 128 | 185.9436 | 221.6163  | 154.8339 |        153.6508        |
|          cait_m36_384           |  4  | 166.9447 | 166.6553  | 122.592  |        123.115         |
|           resnest101e           | 64  | 164.5782 | 187.8362  | 114.7287 |        120.7624        |
|             dla102              | 128 | 171.6912 | 210.1931  | 112.1715 |        112.2878        |
|     swsl_resnext101_32x16d      | 32  | 118.5139 | 141.3195  | 112.0546 |        115.8863        |
|         poolformer_m36          | 64  | 144.8525 | 145.5564  | 107.1212 |        107.7999        |
|        tnt_s_patch16_224        | 128 | 322.8493 | 323.1132  | 106.5805 |        108.753         |
|          inception_v3           | 128 | 160.3704 | 184.7079  | 104.6501 |        105.1609        |
|       gluon_inception_v3        | 128 | 160.0537 | 185.0892  | 104.4408 |        104.9205        |
|        adv_inception_v3         | 128 | 160.0536 | 185.9455  | 104.3819 |        104.9535        |
|        res2net50_14w_8s         | 128 | 140.3924 | 178.5765  | 102.2778 |        104.1685        |
|           convit_base           | 64  | 162.8577 | 162.7243  | 100.661  |        100.5891        |
|             dpn107              | 32  | 114.9983 | 132.9861  | 99.2536  |         95.269         |
|           res2next50            | 128 | 126.0932 | 152.7903  |  91.987  |        92.3562         |
|        gluon_xception65         | 32  | 98.8015  | 117.3475  | 91.7116  |        91.4222         |
|  swin_base_patch4_window7_224   | 64  | 145.8864 |  151.92   | 89.1586  |        89.6291         |
|        res2net101_26w_4s        | 64  | 99.7373  | 126.2205  | 85.5107  |        90.3655         |
|            fbnetv3_b            | 128 | 116.6947 | 143.8025  | 85.2756  |        84.4169         |
|          mixer_b16_224          | 128 | 116.3405 | 113.8076  | 85.1781  |        85.5064         |
|           dm_nfnet_f0           | 128 | 126.9163 |  127.153  | 84.0476  |        86.9863         |
|            pit_b_224            | 64  | 118.1604 | 118.3015  | 81.8061  |        82.0435         |
|          convnext_base          | 64  | 122.3644 |  122.136  |  80.341  |        81.2337         |
|         visformer_small         | 128 | 90.9741  |  96.1877  | 77.2929  |        77.6852         |
|      beit_base_patch16_224      | 64  | 101.2169 | 104.3406  | 74.5633  |        74.3804         |
|            nfnet_l0             | 128 | 112.1575 | 136.0985  | 74.1089  |        76.9059         |
|       eca_botnext26ts_256       | 128 | 109.1123 | 147.7252  | 74.0773  |         74.612         |
|          cspdarknet53           | 64  | 95.6035  | 113.6973  | 73.3155  |        71.2928         |
|          gmlp_s16_224           | 128 | 136.8148 | 125.5279  |  73.21   |        73.9143         |
|          jx_nest_base           | 32  | 100.4914 |  100.579  | 71.8847  |        72.4915         |
|            gernet_l             | 128 |  78.595  |  92.663   | 71.0692  |        69.2428         |
|          botnet26t_256          | 128 | 102.1593 |  117.133  | 70.9124  |        70.2758         |
|           volo_d1_224           | 64  | 120.214  | 122.9313  | 70.5404  |        71.3166         |
|      vit_base_patch16_224       | 64  | 86.5147  |  86.8027  | 69.7947  |        69.7396         |
|            repvgg_a2            | 128 | 78.5995  |  97.1157  | 67.9383  |         65.825         |
| deit_base_distilled_patch16_224 | 64  | 84.4846  |  84.9274  | 66.9879  |        67.0427         |
|          gmixer_24_224          | 128 | 117.3712 | 131.7531  | 66.2274  |        66.4664         |
|       tf_efficientnet_b0        | 128 |  85.38   | 120.4567  | 61.2364  |         59.916         |
|           fbnetc_100            | 128 | 84.0878  | 107.8893  | 61.1549  |        57.4955         |
|      xcit_large_24_p8_224       |  5  | 122.7342 | 140.5676  | 60.9183  |         73.461         |
|           rexnet_100            | 128 | 80.4141  | 108.9174  | 59.2789  |        57.7954         |
|        twins_pcpvt_base         | 64  | 118.3042 | 129.9154  | 59.0304  |        65.5546         |
|         coat_lite_mini          | 128 | 112.5434 | 113.0213  | 57.6713  |        58.6503         |
|            tinynet_a            | 128 | 74.6148  | 103.5743  | 57.6258  |        58.8207         |
|           mobilevit_s           | 64  | 84.4852  |  111.38   | 57.2397  |        56.5056         |
|        sebotnet33ts_256         | 64  | 80.8522  | 101.3005  | 51.4625  |        50.6561         |
|          spnasnet_100           | 128 | 71.7569  |  91.4033  | 50.4387  |        48.1364         |
|         crossvit_9_240          | 128 | 81.6236  | 103.3609  |  48.936  |        49.7689         |
|          ghostnet_100           | 128 | 90.0808  | 119.5541  | 48.2194  |        54.6812         |
|        ese_vovnet19b_dw         | 128 | 64.8542  |  74.8381  | 46.1467  |        45.5557         |
|         mobilenetv2_100         | 128 | 66.3986  |  85.4064  |  46.088  |        44.1399         |
|           mnasnet_100           | 128 | 65.2385  |  83.3539  | 43.7378  |        41.8766         |
|           selecsls42b           | 128 | 59.9741  |  73.9238  | 42.5513  |        42.5353         |
|      mobilenetv3_large_100      | 128 | 62.1646  |  77.4694  | 41.8558  |         41.075         |
|          resmlp_12_224          | 128 | 53.0066  |  59.4581  | 41.6711  |        41.8262         |
|           regnety_002           | 128 | 41.0124  |  53.1931  | 27.0101  |        29.9905         |
|            lcnet_050            | 128 | 32.0968  |  41.2566  | 18.1331  |        21.0731         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

bench_logs/timm_models_amp.png :

bench_logs/torchbench_amp.png :

bench_logs/huggingface_amp.png :

Build Summary

see more

Run name

day_103_13_04_23_performance_amp_153

Commit hashes

pytorch commit: 979c5b4
pytorch commit date: 2023-04-14 02:15:53+00:00
torchbench commit: cd89d490ecbcca7d8ca50324522b31a1a198c753
torchbench commit date: 2023-04-13 11:05:33-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git979c5b4

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.7
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8500
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@EikanWang
Copy link
Collaborator

@williamwen42 , may I know if the data is training performance or inference performance?

@desertfire
Copy link
Contributor

@williamwen42 , may I know if the data is training performance or inference performance?

This was for training.

@anijain2305
Copy link
Contributor Author

The new dashboard is at https://hud.pytorch.org/benchmark/compilers - Closing the issue.

@andreigh
Copy link
Contributor

Quick question, was nvprims_nvfuser removed from backends ? It's not in print(torchdynamo.list_backends()) nor in the dashboard above, but it is in the documentation.

@msaroufim
Copy link
Member

@andreigh yes it was, it was also removed from the docs here https://pytorch.org/docs/main/torch.compiler.html which you should go to for the most up to date info

short rationale was discussed here https://dev-discuss.pytorch.org/t/question-about-nvfuser-being-removed/1453/2?u=msaroufim

@yinrun
Copy link

yinrun commented Dec 16, 2023

@williamwen42
excuse me, can you tell me are there any documents to show how the data in the Performance Dashboard is achieved? I am try to to reproduce the experiment locally

@williamwen42
Copy link
Member

@yinrun this is a legacy dashboard - current performance metrics can be seen at https://hud.pytorch.org/benchmark/compilers. The entrypoints can be found at benchmarks/dynamo/[torchbench/huggingface/timm_models].py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dynamo triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests