I’m trying to use CUDA Decoder API to develop a multi-stream H.264 decoder.
However I found that my program can only create up to 4 decoder instances. When the 5th time I called cuvidCreateDecoder(), it returned CUDA_ERROR_NO_DEVICE.(The previous 4 decoder works perfect simultaneously)
My card: GT 130M
Is it the hardware limitation of CUDA VP2 ?
Is VP2 a hardware component or just a abstract layer built based on the computation ability of Stream Processor of CUDA? If it’s just a hardware on the card, why it is called CUDA Decoder API…?
Each stream needs around 100MB (depends on the number of decode and output surfaces you create it with) and that is the only limiting factor. On a 1GB card, I can create 10 decoders with each instance taking 80MB for 1440x1080, however all streams drop frames at that point.
Can you map more shared memory to your integrated GPU, or reduce the number of decode surfaces to the bare minimum required by the complexity of your streams?
Thanks! You gave me useful info about CUDA that at least I know larger DRAM means more streams.
However, no matter what the resolutions of my video clips are, my card could only create 4 decoders for 4 streams.(e.g. I could only decode four 1080p video streams and also four 720p streams only) Reducing the surface used also makes no improvement of the decoder number.
It seems that it’s not related with DRAM usage on my card. (I limited the surface memory usage as 24MB but useless.)
Any other comments on how to make more streams decoding simultaneously?
Hello, there
I have met the same problem as yours. An only difference is that I can create up to 8 decoders. After that, one more try with cuvidCreateDecoder would return err code 100.
Have you found any solution for this yet?
BTW, I am coding on my laptop which equipped with a NVidia Geforce 8600m GT card.