What does the future hold: Premji Invest’s 6 AI predictions
The world of AI (especially generative AI) has seen a lot of activity over the past 12 months.
With announcements from industry (Gemini, LLaMA, GPT4 Turbo), new research from academia (NeurIPS 2023 saw a ~30% increase in number of paper submissions) and increasing government regulation (Biden’s Executive Order on AI and EU’s AI Act), it might seem incredibly hard to foresee how the landscape will evolve in 2024.
At Premji Invest, we’re privileged to have a unique, full-stack investing strategy from series A to public equities that allows us to understand how, where and to whom value might accrue.
Our public markets practice gives us unique insight into how hyperscalers and chip manufacturers think of innovations and economics in the compute layer. Our growth equity practice charts out the challenges scaled tech companies face in deploying AI into production. And finally, our early-stage effort helps us look around the corner to understand how entrepreneurs are thinking of entirely new systems of record and engagement with LLMs at the center.
Each of these practices feeds insights into each other, allowing us a more comprehensive view of the AI value chain.
We’re excited for what is in store in 2024, and these are our top 6 predictions for the year.
#1: Single model dominance wanes
Over the past year, we’ve seen organizations use a single, very large model to solve specific use cases, at times with multiple calls to the LLM for a single query. However, single, large models are often times performant and fast, but are costly to use. While slivers of the market (e.g. for internal use cases) might use large models, the dominant method by which LLM apps get created will be ensemble models.
Ensemble models are basically a collection of models that come together to do a specific task. So instead of trying to find the perfect solution to the trifecta of performance, cost and latency with just 1 model, enterprises will use many models working in tandem. The ensemble models approach also allows enterprises to swap different models in and out, reducing their dependency on just one model provider, and potentially just one cloud provider.
There could be various implementations of this architecture, including fine tuning models to perform a specific task with low hallucination, using a model to check the results of the overall system, and even switching off models that aren’t needed depending on the context of a conversation.
The ensemble could have small models for latency specific tasks, open source models in customer’s VPC for sensitive applications and different architectures (e.g. sub quadratic attention mechanisms or mixture of experts)
#2: RLAIF is the prevalent method of alignment
Alignment is the process by which you convert a an LLM, which is built as a next token predictor, to follow user intent in a helpful, harmless and truthful way.
There are various ways of alignment, with RLHF being the most popular one. OpenAI details out their process of RLHF (Reinforcement Learning with Human Feedback) in a March ’22 paper so we won’t go over it again here.
There are various issues with RLHF, including:
- Bias: ~75% of the labelers OAI used were <35 years of age and mostly from the US, Bangladesh and the Philippines. This biases to young, potentially towards a specific political leaning and cuts out minorities
- Hard to scale: For complex behaviours and tasks, especially where human beings would struggle to demonstrate or label data, RLHF is hard to scale and is expensive
- Explainability: Give the opaque nature of labelling, it’s hard to explain model behaviour
RLAIF (Reinforcement Learning with AI Feedback) as the alternative. RLAIF requires the developers to define a set of rules or guidelines (Anthropic calls it’s a ‘Constitution’) that the model would refer to, in order to self-critique its own responses. The model learns via an iterative loop.
The only human intervention is in defining these set of rules. One can define the rules they want, design different rules for different models and if you want to understand why a model behaved in a certain way, you can go back to the rules. This might not explain model behaviour completely, but it is a step in the right direction.
To top it all off, researchers at Google, in their paper from Dec '23, showed that RLAIF can be as good as RLHF and is easier scale and 10x cheaper.
#3: LLM Observability~Software Observability
The number of LLM apps in production will rise rapidly in the coming years. Enterprises will first deploy non-mission critical applications like marketing support, email writing for salespeople, or customer support, slowly making their way up to business-critical applications.
But every model will realize drift, and drift compounds when models interact with each other. There are two main reasons for drift –
- Changing data – the data that the model sees in production is different from the type of data it was trained on
- Changing user behaviour – the way customers use the product today might be different from how they use it tomorrow. For example - transaction behaviour on Amazon today could be very different from a year back!
Drift is inevitable, and enterprises will have to answer two big questions:
- Is my model doing what it was supposed to? This is not a simple question to answer because ground truth is hard to establish!
- How do I course correct without messing up something else? ML engineers arrive at model weights with a lot of trial and error and sheer resilience. It’s painstaking work. Further changing the weights, after they’ve been heavily optimized and without impacting all other models is a non trivial problem to solve.
We believe LLM observability will be a huge challenge in the coming years and make the prediction that the market for LLM observability could be as big as the one for software observability.
#4: LLMs are natively multi modal
Building multi modal models has been difficult so far given the need for large amounts of data, expensive compute and a concentration of talent in the OpenAI and Deepminds of the world.
About two months back we saw the first natively multi modal model in Gemini. One of the biggest insights from the paper was that models trained on multiple modalities outperform those trained on a single modality. In addition, there are early signs that Gemini Ultra outperforms GPT-4 on certain tasks. And that would make sense, given GPT-4 is essentially just a wrapper (albeit a very good one!) on top of OpenAI’s vision, audio and text models. Gemini, on the other hand, mimics how the human brain works, perceiving the world with all its senses. Under the hood, it includes a multimodal encoder that processes data from each modality independently, cross modal attention that allows the model to learn relationships across modalities and a multi modal decoder that creates outputs, whether it be text or image. It’s all trained together under one loss function.
With compute dropping (H100s are 3-4x more efficient than A100s for the same $ spent), diffusion of talent and architectural innovations (e.g. mixture of experts, inference and training optimizations), we expect models to be natively multi modal.
Companies will realize there is a need for less data per modality when you train multi modal (because the model learns the same concept in various modalities), which modalities to use will differ by use case, most use cases will need 2+ modalities and one would need a critical mass of data per modality.
#5: Agents’ promise leads to chaos
Agents are LLMs that have access to tools, know which tool to use for a task and how to use it. One of the simplest agents is one that can read your calendar and email, understands it needs to schedule a call, finds a free slot and sends a Zoom link. The next iteration are multi-agent applications – a group specializing in different areas of expertise, working collaboratively to solve complex problems like conducting a supply chain analysis.
While we believe agents will increase productivity and change the way we operate, the path to get there might be murky. An agent that can read my calendar is relatively harmless but an agent with access to my credit card information (e.g. to reserve a table on OpenTable) can be dangerous.
There are various challenges we foresee, including:
- Reliability – is my agent system even working? Most on the ChatGPT GPT-store break quite often!
- Orchestration – When do I call which agent? In which order? Who defines that?
- Observability – What logs do I need from an agent? How do I debug effectively?
- Juggling performance, cost and latency – especially for multi agent systems, how do I optimize the overall system? How does this trickle down into the design choices for each agent?
- Security – How do I protect against prompt injections, data contamination, model weight theft?
#6: Go deep or go home
While all of our previous predictions have been technical, we couldn’t stop ourselves from adding one which is more commercial. We believe AI will create a golden era for vertical SaaS, creating 100x the number of opportunities we saw in legacy vertical SaaS.
In software 2.0, horizontal SaaS produced much larger outcomes than vertical SaaS, mainly because legacy vertical SaaS had small TAMs. Vertical SaaS businesses today represent $200B of enterprise value (across Toast, Procore, Veeva, Guidewire, Shopify, Tyler Technologies, Autodesk to name a few). In comparison, if we just take 8 of the large cap horizontal names – Oracle, SAP, Microsoft, Salesforce, Adobe, Amazon, Intuit, ServiceNow, this represents nearly $6T of enterprise value – a 30x step-up. Large cap horizontal enterprise software sold to Fortune 500s with the biggest budgets, building sticky workflows and getting deeply integrated.
With the AI boom, things change for vertical SaaS, including:
- Proliferation of new use cases and new markets – think of all the new use cases in marketing (generate assets with a prompt), legal (create a defence with existing case facts) that you’ve heard of in the last year!
- Jump in willingness to pay – hyperintelligent assistants are more useful, like AI to help neurosurgeons scan brain images, understand pre and post op protocols and reduce admin time.
- Significant moats, created by the need for vertical specific context and data that is probably out of reach from large foundation models
In comparison to vertical SaaS, we see incumbents in horizontal SaaS benefitting from AI, embedding AI within their products (like ServiceNow) and playing to their distribution advantage.
2024 is bound to be an exciting time for founders building in AI. If you have an idea you’re excited about, we want to hear from you!