In The Frame – Agentic Frameworks

Large language models (LLMs) have taken the world by storm since Open AI’s ChatGPT arrived on the scene on November 30^th 2022. A torrent of investor money has been ploughed into AI start-ups, and it is hard to find a technology press release these days that does not reference AI. Even IBM’s latest mainframe, the z17, is being pitched as an AI accelerator. LLMs are neural networks with millions of electronic neurons and billions of connections between them, with each connection being assigned a weight, assigned during the training of the model on large amounts of data such as text or pictures. The result is a model that can produce convincing conversational text in response to prompts such as “write me an essay on the Industrial Revolution” or “Give me a list of possible names for my new kitten”, or “Write me a poem about springtime”. Modern LLMs like ChatGPT and its rivals (Claude, Gemini, LLAMA, Perplexity etc) produce fluent and convincing text in a wide variety of languages. Indeed they can also write output in programming languages like Python or C, which has been one of the more convincing use cases for the technology, although even that has issues. It is important to note that the output that they produce is almost always plausible but not necessarily correct, an issue known as AI hallucination. Estimates vary, but at least 1 in 4 LLM answers typically contain hallucinations.

LLMs are heavily dependent on their training data. There has been much hype about their ability to solve maths problems, but it turns out that a lot of this was due to their having extensive maths problems in their training data. A test in March 2025 of seven LLMs on the US Math Olympiad questions, conducted just hours after the questions were released, resulted in a maximum 5% score by the LLMs, very different from what had been touted previously. It is also clear that for corporate purposes the LLM will need some business-specific context for many tasks. A customer service chatbot will need access to customer order history, price lists, technical manuals etc. This has led to retrieval augmented generation (RAG), where an LLM is supplemented by being given access to files of domain-specific information that had been translated into a vector database, a format that an LLM can understand.

At this stage we have assumed that a single LLM is being used. However recently there has been a lot of interest in the idea of agentic AI. The idea here is to link multiple LLMs (and possibly other resources, such as API calls to other systems) together and give them some goal. For example, you might want a travel-related agent to research and book a holiday based on your preferences and budget, or automate some process in a back office, or carry out predictive maintenance in a factory. The key is that agentic AI is focused on a goal, not merely outputting text, and has autonomy or “agency”. There are a number of issues and challenges with such an approach, hallucinations being just one, but this is an area that is being actively explored by many companies.

One immediate barrier to implementing agentic AI is the lack of standards and tools for actually designing and building such a web of linked AIs and other tools. To build such a network you will need to consider designing and planning the workflow, tool integration, memory management, and orchestrating the separate tools and testing and monitoring them. This has led to a series of “agentic AI frameworks” being built to help with these tasks. Examples are Langchain (open source), CrewAI (from CrewAI), Autogen (from Microsoft), Swarm (OpenAI), Promptflow (Microsoft), smolagents (open source), and Phidata (from Phidata), as well as Transformers Agents from Hugging Face. Other related initiatives include MCP (from Anthropic), which is a standardization layer for agents.

These approaches certainly give a head start to developers, but issues remain. Tool integration, especially to legacy systems that may be needed as components of an agent, may be an issue, as is performance, which may be an issue for applications needing low latency. Any errors made by the various components will cascade through the interconnected systems. Since LLMs are by nature unpredictable and are typically black boxes unable to explain their choices, debugging and auditing will be challenging. This is especially an issue in heavily regulated industries like healthcare and finance, where trust is key. Agents will need access to many other systems, requiring login authentication to various systems and perhaps the authority to spend money on a credit card or similar, making them a tempting target for hackers. The web of agents also needs to be able to react effectively to unexpected or unusual responses or events; the more agents or components involved in a chain, the harder this is to be certain about.

Agentic AI is very much in its infancy at the time of writing, with considerable industry hype but few verifiable case studies of it being used in production. Building vendor sales demonstrations of an agentic flow with two or three components including an LLM is one thing, but deploying a more complex web of agents into a production environment is an altogether more challenging task. The emerging AI frameworks promise to at least help out in this regard, providing some structure and improving productivity by having some tedious but absolutely necessary tasks provided within the frameworks. At present the frameworks themselves are quite new (Langchain emerged in October 2022) and are maturing, so the entire industry is at quite a formative stage. However, the more these frameworks develop and become road-tested, the easier it will be to deploy agentic AI systems and begin to go from demonstrations to production applications that have measurable returns on investment.

Post a public comment? Cancel comment reply

You must be logged in to post a comment.