Is it possible to run multiple LLM instances in parallel using multithreading to handle multiple queries simultaneously on Jetson Orin AGX?

KawsarAhmad43 · May 23, 2025, 11:16am

I’m exploring the possibility of running multiple LLM (Large Language Model) instances in parallel on the Jetson Orin AGX using multithreading or multiprocessing. The goal is to handle multiple queries simultaneously to improve performance and responsiveness in real-time applications. I’d like to know if this is feasible given the Orin AGX’s GPU and CPU architecture, and what would be the recommended approach — whether using threading, multiprocessing, or containerization. Any insights on resource allocation, performance optimization, or example implementations would be highly appreciated.

AastaLLL · May 26, 2025, 3:43am

Hi,

Please check our tutorial below for running LLM models:

Thanks.

KawsarAhmad43 · May 26, 2025, 10:04am

Dear Responsible Moderator @AastaLLL
I’m aware of how to use LLM models on the device. That wasn’t my query.
My question was: Is it possible to run multiple LLaMA instances at the same time?
If there’s any possible way to do this and you could share it, it would be really helpful.

AastaLLL · May 28, 2025, 7:43am

Hi,

Is your goal to handle multiple queries simultaneously?
If so, you can try the above sample as it can handle multiple queries at the same time.

Loading the same model multiple times will take much more memory which might not be the optimal way.

Thanks.

system · June 17, 2025, 6:36am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TensorRT-LLM for Jetson Jetson AGX Orin generative_ai	10	2077	April 21, 2025
Want to run a Local LLM on Nvidia Jetson AGX Orin Jetson AGX Orin generative_ai	3	3001	July 17, 2024
LLMs on DLA Jetson AGX Orin generative_ai	4	72	April 2, 2025
Use of LangChain and LangGraph in Jetson Orin AGX Jetson AGX Orin generative_ai , llama	4	204	January 22, 2025
Running an SLM and Computer Vision model simultaneously Jetson Orin Nano yolo , generative_ai	3	73	November 7, 2024
Seeking Advice on Running Quantized Large Language Models on Jetson AGX Xavier Jetson AGX Xavier generative_ai	2	826	March 19, 2024
Running llama3.3 or llama4 on Jetson AGX Orin Developer Kit (64 GB) Jetson AGX Orin generative_ai	8	166	May 12, 2025
Running LLMs with TensorRT-LLM on Nvidia Jetson AGX Orin Dev Kit Jetson Projects jetson , generative_ai	1	544	December 8, 2024
Want to deploy a LLM on multiple AGX Orin Jetson AGX Orin generative_ai	9	156	March 19, 2025
Multiple models on DLAs in AGX Xavier 32TOPs Jetson AGX Xavier	13	1349	October 18, 2021

Is it possible to run multiple LLM instances in parallel using multithreading to handle multiple queries simultaneously on Jetson Orin AGX?

Related topics