A blog cover: Large Language Models (LLMs)
Generative AI
Touchapon Kraisingkorn
min read
June 6, 2024

2024 LLMs: Smaller, Faster, Smarter, More Collaborative


As we step into 2024, the evolution of Large Language Models (LLMs) is set to redefine the AI landscape. With advancements aimed at making these models smaller, faster, and smarter, coupled with innovative workflows to enhance their accuracy, the future of LLMs looks promising. This article explores the key expectations for LLMs in 2024, focusing on their efficiency, the rise of agentic workflows, larger context size support, and the evolving role of prompt engineers.

LLMs Will Get Smaller, Faster, and Smarter

The trend towards more efficient and compact models is gaining momentum. Leading AI companies like OpenAI, Anthropic, Meta, and Microsoft are at the forefront with models such as GPT-3.5-Turbo, Haiku, LLaMA 3 8B, and Phi-3.

Leading AI Models: GPT-3.5-Turbo, Haiku, LLaMA 3 8B, and Phi-3

The research shows that these smaller language models are able to match or even outperform larger and more expensive models like GPT-4 across various benchmarks. For instance, the compact Phi-3-mini model from Microsoft, with only 3.8 billion parameters, is able to rival the performance of models with significantly more parameters like Mixtral 8x7B and GPT-3.5. Similarly, the 8 billion parameter LLaMA 3 8B model from Meta has been found to be comparable or even better than the larger GPT-3.5 in certain tasks (Microsoft, 2024).

These advancements in model efficiency are anticipated to further reduce costs, making powerful LLMs more practical and widely available for diverse applications, from customer service chatbots to advanced data analytics. 

The ability of these smaller and more efficient models to match or exceed the performance of their larger counterparts is a significant development that is expected to democratize access to advanced AI capabilities. Businesses and startups will be able to leverage these cost-effective and energy-efficient LLMs to unlock new use cases and drive innovation.

Agentic Workflow

One of the most groundbreaking advancements in AI is the shift towards agentic workflows. Andrew Ng, a renowned AI expert, has underscored the significance of these workflows in driving substantial progress. He identifies four key design patterns for AI agentic workflows:

AI is trying to reflect human respond
Reflection Ability of AI

Reflection: LLMs can enhance their effectiveness by reflecting on their own behavior. For example, an LLM used in a customer service application can analyze past interactions to identify patterns in customer queries and improve its responses over time. This self-reflective capability allows the model to learn from its mistakes and adapt to new scenarios, thereby increasing its accuracy and reliability.

Tool Use: LLMs can act as agents by utilizing external tools for tasks such as search, code execution, and data manipulation. For instance, an LLM integrated with a financial analysis tool can automatically fetch real-time market data, perform complex calculations, and generate investment recommendations. This ability to leverage external tools extends the functionality of LLMs beyond text generation, making them versatile agents capable of handling a wide range of tasks.

Planning: LLMs can autonomously decide on the sequence of steps to execute for complex tasks. Consider an LLM used in project management software. It can break down a project into smaller tasks, assign deadlines, and monitor progress, all while adjusting the plan based on real-time updates. This planning capability enables LLMs to manage intricate workflows efficiently, reducing the need for human intervention.

Collaboration of Agent AI and human
Multi-Agent Collaboration

Multi-Agent Collaboration: Prompting an LLM to play different roles for different parts of a complex task can summon a team of AI agents to perform the job more effectively. For example, in a medical diagnosis application, one LLM agent could focus on analyzing patient history, another on interpreting lab results, and a third on suggesting treatment options. By collaborating, these agents can provide a comprehensive diagnosis, improving the overall accuracy and reliability of the system.

Ng asserts that these agentic workflows will drive significant AI progress, potentially surpassing the advancements of the next generation of foundation models. This structured and interactive problem-solving process will help improve accuracy and reduce hallucinations in LLM implementations.

Larger Context Size Support

Advancements in context size support are poised to revolutionize the capabilities of LLMs. Models like Google's Gemini 1.5 now support context sizes of up to a million tokens, moving towards near-infinite memory capacities. This increased context size will enable more complex, multimodal use cases, such as video analytics and batch data processing.

For instance, in video analytics, an LLM with a larger context size can analyze entire video streams rather than just individual frames. This holistic approach allows the model to understand context, detect anomalies, and generate insights more accurately. Similarly, in batch data processing, an LLM can handle large datasets in a single pass, making it possible to perform complex analyses and generate comprehensive reports without the need for multiple iterations.

However, it is important to note that these advancements may not be suitable for Q&A Retrieval-Augmented Generation (RAG) use cases due to cost and response time impacts. The increased computational requirements for handling larger context sizes can lead to higher operational costs and slower response times, making them less practical for applications that require real-time interactions.

Prompt Engineers into Agents Architect

With the rise of agentic workflows, the role of prompt engineers is undergoing a transformation. According to Andrew Ng, the future will see prompt engineers transitioning into "Agents Architects." This new role involves designing how groups of AI agents interact to perform complex tasks effectively. Instead of just crafting individual prompts, Agents Architects will focus on orchestrating multiple agents to collaborate and achieve the best results.

For example, in a content creation application, an Agents Architect might design a workflow where one LLM agent generates an initial draft, another agent reviews and edits the content, and a third agent optimizes it for SEO. By coordinating these agents, the Agents Architect ensures that the final output is of high quality and meets all requirements. This shift underscores the need for AI professionals to continuously adapt and innovate in response to emerging technologies and methodologies.


As we look ahead to 2024, the expectations for LLMs are both exciting and transformative. From becoming smaller, faster, and smarter to leveraging agentic workflows and larger context sizes, LLMs are set to revolutionize various industries. The evolving role of prompt engineers into Agents Architects further highlights the dynamic nature of the AI field. These advancements promise to enhance the accuracy, efficiency, and applicability of LLMs, paving the way for a new era of AI-driven innovation.

Consult with our experts at Amity Solutions for additional information on generative AI here