How to reduce almost 80% of LLM costs¶

If you are an AI developer or building a SAAS wrapped around AI foundational models, its really necessary to think about spending a lot of money on LLMs without any prior knowledge on how to reduce these costs..

routerbench

Change the Model¶

By replacing the LLM like GPT-4 with a small language models like phi-3 or Mistral for specific tasks that doesn't need more precise and optimized responses. this way you can have major cost cuttings.

insights

LLM Router¶

Use different models for different tasks, use the SLMs (Small Language Models) for interacting and intial tasks. If the SLMs couldn't provide a better response for the user prompt, In this case send the user prompt to LLMs to generate a more consice and efficient response.

as mentioned in the latest paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System

routerbench

Multi-Agent Setup¶

As been a contributor to the AutoGen framework (Microsoft), i believe the multi agent setup can solve much more broader use cases than using a single model like GPT4, etc..

AutoGen

Every agent can be powered with a different LLM
Can add multiple agents into a groupchat making it much more diverse
You can also use the function calling to call you external APIs
Making it cost effective

This allows us to only allow agents which need lot of understanding and solve critical problems be backed by the powerful LLMs(GPT4)

LLM Lingua¶

LLM Lingua is a method introduced by Microsoft that focuses on optimizing the input and output of large language models. By removing unnecessary tokens and words from the input, you can significantly reduce the cost of running the model. This method is particularly effective for tasks such as summarization or answering specific questions based on a transcript.

LLMLingua

Reduces the cost on input tokens
Uses the small model to compress your prompt and then pass it to the LLM
This adds more meaning and value to your prompt

As a Lead GenAI Consultant at VEnableAI, we provide a comprehensive array of services, including AI Agents Workflow and chatbots, designed to optimize performance and maximize results while minimizing costs.