LLMs for robotics
LLMs are large, deep-learning models trained on vast amounts of data. Their main component is the transformer model neural network [1].
A transformer model is a neural network that can learn the context and meaning of a sentence by tracking the relationship in sequential data, such as words, using a mathematical technique called attention. In simple words, transformers can learn and generate human-like text by analyzing patterns from large text data.
Google introduced attention in 2017 as a research paper, Attention is All You Need [2], and it is now the foundation model for all the famous LLMs like ChatGPT from OpenAI, Llama from Facebook, and Gemini from Google. These LLMs are great performers in understanding natural language, reasoning, and decision-making. Along with LLMs, there are:
- Vision language models (VLMs), which combine an LLM with a vision encoder, enabling the LLM to understand images and videos that can be used for robot perception...