What to Watch in AI

Investors from Sequoia, USV, Kleiner Perkins, and more select the AI startups you should keep an eye on.

Artwork by 
Ricardo Tomás
Back to all

Brought to you by Public

Lock that rate down. Right now, you can take advantage of some of the highest Treasury yields since the early 2000s on Public. It only takes a few minutes to create your account, purchase government-backed Treasury bills, and start generating a historic 5%+ yield on your cash.

For context, the national average savings account interest rate is just 0.25% APY. That means you can generate 20x more income than what you might get if you let your cash sit in a traditional savings account. Plus, once your Treasury bills reach maturity, they are automatically reinvested, so you have one less thing to think about.

Need access to your funds? No problem. You have the option to sell your Treasury bills on Public at any time—even before they reach maturity. So, it's the best of both worlds: the high yield of US Treasuries and the flexibility of a savings account. Get started today.

You can listen to an audio version of The Generalist on Spotify or Apple Podcasts.

If you only have a few minutes to spare, here’s what investors, operators, and founders should know about AI’s most exciting startups.

  • Augmenting human health. Startups are using AI to improve healthcare outcomes and devise new treatments. Alife, for example, uses the technology to improve IVF treatment, giving patients a better chance of conceiving. The firm’s approach could fundamentally disrupt the industry’s cost structure in time. NewLimit is another healthcare startup leveraging AI. Its team is looking to discover better ways of treating previously intractable diseases.
  • Serving the enterprise. Much of the generative AI trend has played out in front of a consumer audience. The average internet user can play around with complex models, creating text and images. Several promising companies are more directly chasing enterprises, building products incorporating internal data that adhere to corporate guidelines. Glean, Lamini, Dust, and Lance are examples of this trend.
  • Using AI to stop AI. The AI revolution may unlock many new opportunities, but it introduces plenty of threats along with it. In particular, generative AI makes creating realistic written messages trivial, increasing the volume and sophistication of “spearfishing” scams that are designed to extract personal information from a recipient. Players like Abnormal Security have emerged to protect against such attacks, using AI to detect malicious AI messages.
  • Beyond America. Though the US has many of the industry’s big players, like OpenAI and Google, promising startups are developing beyond its borders. Mistral is building open-source large language models from a Paris HQ – one contributor expects them to challenge OpenAI’s influence. Sereact, a German business, has developed an impressive AI-powered robotics product, securing contracts with industrial giants.


Artificial intelligence is the technology story of the year; it may prove to be the defining narrative of our decade. Since our last edition of this series, the sector has continued to attract capital, talent, and attention. Not all attention has been positive, of course. Though there’s broad excitement about the capabilities of the technology, the past four months have seen industry heavyweights express their concern and regulators begin to devise some palisades. The following months and years look set to determine how fully AI impacts our lives, creating new winners and losers on a global scale.

Our “What to Watch” series aims to help readers prepare for the coming age and see the future more clearly. It is a starting point for those that wish to understand the technologies bubbling up at the AI frontier and capitalize on the change occurring. To do so, we ask a selection of AI’s most impressive investors and founders to surface the startups they consider most promising. In this latest edition, our most comprehensive yet, you’ll learn how AI startups are assisting with human fertility, helping in our factories, accelerating corporate processes, and much more.

These are our contributors’ picks.

Note: Long-time readers will know we intentionally don’t preclude investors from mentioning companies they’ve backed. The benefits of increased knowledge and “skin in the game” outweigh the risk of facile book-talking. Across contributors, we do our utmost to select for expertise, originality, and thoughtfulness.


Improving IVF outcomes with AI

In any fertility procedure, there are designated moments of human decision-making. Two of the most relevant of these in IVF are “ovarian stimulation” and “embryo selection.”

“Ovarian stimulation” refers to determining the dosage of medication patients receive to stimulate the growth of follicles in the ovaries and when to deliver the trigger shot that stimulates the release of the eggs from those follicles. Timing the trigger shot is critical – too early and you can get premature eggs; too late and you can get post-mature eggs or not as many as possible.

“Embryo selection” means choosing which fertilized egg to use and implant. Currently, clinicians and embryologists, as in most of medicine, use a combination of their own experience and training, morphological grading systems, and trial and error to decide. If the dosage or timing is off on one cycle, they adjust it on the next. Many doctors are fantastic at this, creating a system where skill is widely varied and important in outcomes. For fertility, a significantly supply-constrained market, this means sky-high prices, particularly to see the best of the best and widely varying results across the field.

Alife builds AI-powered tools to improve in vitro fertilization (IVF) outcomes. The company gives practitioners superpowers to augment their decision-making with AI tools that leverage large datasets of inputs and outcomes. Now, through a simple interface, doctors can input a patient’s characteristics and get precise recommendations at critical moments in the fertility process, taken from the outcomes of thousands of previous cycles. The datasets come from large collected resources of patient outcomes that already exist and, in turn, get better with each patient that uses the Alife products.

These tools will change the nature of the fertility industry. Alife’s studies indicate that their machine learning model can help doctors optimize the trigger shot in the 50% of patients triggered too early or too late and help retrieve up to three more mature eggs, two more fertilized eggs, and one more embryo on average. Alife’s products can significantly broaden access to fertility treatments, bringing down the cost per patient by lowering required drug dosages and improving the success rate of each expensive IVF cycle. And they will flatten the playing field of doctors, allowing those with less firsthand experience to access broader knowledge and inputs.

Eventually, you can imagine Alife’s tools providing all input for judgment moments in a process and allowing practitioners outside of doctors to perform cycles, significantly changing the sector’s cost structure and availability. More importantly, data-driven precision medicine that augments – or eventually replaces – a person’s judgment with personalized recommendations is not unique to IVF. Across medicine, there are thousands and thousands of moments like this and an opportunity to leverage data to dramatically transform outcomes and access to critical procedures and treatments.

Rebecca Kaden, General Partner at Union Square Ventures


Enterprise search and beyond

Finding the exact information you need at work, right when you need it, should be fast and easy. With the endless number of applications each person uses to get their job done, and the amount of data and documents generated as a result, this isn’t always the case. The exponential rise in “knowledge” and the increasingly distributed nature of work have increased the time needed to find existing knowledge. In other words, “searching for stuff” at work is fairly broken.

To help employers solve this problem, Arvind Jain and his team built Glean, an AI-powered unified search platform for the workplace. It equips employees with an intuitive work assistant that helps them find exactly what they need, when they need it, and proactively discover the things they should know.

The company’s mission from the beginning was simple: to help people find all the answers to workplace questions faster, with less frustration and wasted time. But what has resulted since goes well beyond the realm of search. For example, Glean doesn’t just search across every single one of your workplace apps and knowledge bases (Slack, Teams, Google Drive, Figma, Dropbox, Coda, etc.); it also understands natural language and context, personalizing each of its user interactions based on people’s roles and inter/intra-company relationships. It intelligently surfaces your company’s most popular and verified information to help you discover what your team knows and stay on the same page – all in a permissions-aware fashion.

As organizations become more distributed and knowledge becomes more fragmented, an intuitive work assistant like Glean is no longer a nice-to-have but a critical tool in driving employee productivity. What the company has developed will break down the silos that slow progress and create more positive and productive work experiences.

Additionally, Glean’s search technology positions it to bring generative AI to the workplace while adhering to enterprises’ strict permissions and data governance requirements. Today, one of the key obstacles preventing enterprises from shipping AI applications to production is their inability to enforce appropriate governance controls (e.g., “Does my application understand what the end user is allowed to see and not see?”; “Is the inference done on my servers or OpenAI’s servers?”; “What source data led to a given model output and who owns it?”). By being plugged into an enterprise’s internal environment with real-time data permissions, Glean has emerged as an ideal solution to help enterprises solve governance at scale and confidently leverage their internal data for both model training and inference – serving the role of an enterprise-grade AI data platform/vector store.

Over the fullness of time, we believe every company will have its own AI-enabled copilot personalized to understand the nuances of the business and its employees. And we believe Glean is well on its way to capturing this exact opportunity.

Josh Coyne, Partner at Kleiner Perkins


Storage and management for multi-modal data

We’ve all played with Midjourney, and most of us have seen the GPT-4 napkin to code demo. Midjourney (text-to-image) and GPT-4 (image-to-text/code) illustrate what’s possible when models become multi-modal, bridging the gap across different forms of media like text, images, and audio. While most of the current wave of AI hype has been centered around text-based models, multi-modal models are the key to building more accurate representations of the world as we know it.

As we unlock the next wave of AI applications in industries like robotics, healthcare, manufacturing, entertainment, and advertising, more and more companies will build on top of multi-modal models. Players like Runway and are good examples of emerging leaders in their respective spaces that have seen massive user demand for their products, while incumbents like Google have started releasing similar multi-modal capabilities.

But working with multi-modal models presents a challenge: how do you store and manage the data? Legacy storage formats like Parquet aren’t optimized for unstructured data, so ML teams struggle with slow performance for data loading, analytics, evals, and debugging. In addition, the lack of a single source of truth makes ML workflows much more error-prone in subtle ways. Lance is one company that has recently emerged to tackle this challenge. Companies like Midjourney and WeRide are in the process of converting petabytes-scale datasets to the Lance format and have seen meaningful improvements in performance versus legacy formats like Parquet and TFRecords, as well as an order of magnitude reduction in incremental storage costs.

Lance isn’t stopping at storage – they’ve recognized the need to rebuild the entire data management stack to better fit the world we are moving toward, a world in which unstructured, multi-modal data becomes an organization’s most valuable asset. Their first platform offering, LanceDB (now in private beta), provides a seamless embedded experience for developers who want to build multi-modal capabilities into their applications.

Lance is just one example of a company bringing developers into the multi-modal future – I couldn’t be more excited to see what other technologies emerge to push the boundaries of multi-modal applications. With the pace at which AI is advancing, it won’t be long before that future becomes a reality.

Saar Gur, General Partner at CRV

Abnormal Security

Stemming the tide of AI-enhanced cyber attacks

I am an unabashed optimist about generative AI, but not a naive one. For example, I’m concerned about a huge spike in “social engineering” attacks such as spear-phishing, which typically uses email to extract sensitive information. Incidences have radically increased since ChatGPT exploded onto the scene last year.

According to Abnormal Security, the number of attacks per 1,000 people has jumped from below 500 to more than 2,500 in the past year. And the sophistication of attacks is skyrocketing. Just as any student can use ChatGPT to write a perfectly good essay, it can also be used to churn out fraudulent messages that are grammatically perfect and dangerously personalized, without so much as a Google search.

According to the FBI, such targeted “business email compromise” attacks have caused more than $50 billion in losses since 2013. And it’s going to get worse. Every day, untold numbers of cyber-criminals and other bad actors get their hands on blackhat tools like “WormGPT,” a chatbot designed to mine malware data to craft the most convincing and scalable fraud campaigns.

Fortunately, Abnormal co-founders Evan Reiser and Sanjay Jeyakumar are hard at work using AI to combat this threat. Think of it as using AI to defend against AI. Historically, email security systems scanned for signatures of known-bad behavior, like a particular IP address or attempts to access employees’ personally identifiable information (PII).

Using the power of AI, Abnormal flips this on its head. Since AI-enhanced attacks are designed to seem legitimate, Abnormal’s approach is to understand known-good behavior so well that even subtle departures become visible. The company uses large language models to build a detailed representation of its digital inner and external workings, such as which people typically talk to each other and what they may interact around. If my partner Reid Hoffman sent me an email that said, “Hey, please send me the latest deck for Inflection.AI,” Abnormal’s AI engine would quickly notice that Reid seldom starts sentences with “Hey” and rarely sends one-sentence notes – and that he has never once asked me to send him documents about Inflection. (As a co-founder and board member of the company, he would have more access to decks than I would!)

Not surprisingly, Abnormal has seen accelerating enterprise customer demand as security concerns around generative AI have risen. I find Abnormal’s success particularly gratifying, given how quickly it has harnessed AI to counter a problem accelerated by AI. Bad actors often enjoy a lengthy first-mover advantage in times of disruptive technological change. After all, they can exploit innovations without worrying about product quality, security, or regulators, who have yet to lay down new laws. (The history of spam and ransomware provides interesting case studies.)

At the same time, technology startups are understandably focused on developing powerful new use cases for their innovations rather than stopping illegal or destructive ones. But like everything to do with AI, the potential cyber damage from its misuse is staggering. Thanks to the foresight of the Abnormal team, the new normal for cybercriminals may prove to be at least a little less accommodating.

Saam Motamedi, Partner at Greylock


Augmenting knowledge workers

It’s obvious that Large Language Models (LLMs) will increase the productivity of knowledge workers. But it’s still unclear exactly how. Dust is on a mission to figure that out. Since LLMs won’t be of much help in the enterprise if they don’t have access to internal data, Dust has built a platform that indexes, embeds, and keeps updated in real-time companies’ internal data (Notion, Slack, Drive, GitHub) to expose it to LLM-backed products.

Dust co-founders Gabriel Hubert and Stanislas Polu sold a company to Stripe and worked there for five years. They witnessed firsthand how fast-growing companies can struggle with scale. They’ve seen what they call “information debt” creep in, and they’re now focused on applying LLMs to solve some of the major pain points associated with that. They’re currently exploring the following applications on top of their platform:

  1. Answer engine. The focus is on factuality, as it’s a key to broad adoption.
  2. Compositional assistant. Templated assistance at the time of content creation. For example, generate the paragraph you’re missing based on internal data.
  3. Documents that update themselves. Document owners receive notifications and a pre-crafted recommendation each time a piece of information that should update their document floats through the company.
  4. Structured event extraction. Users can generate structured events from unstructured data (e.g., a Slack thread) based on predefined templates.
  5. Internal data monitoring. Monitor enterprise data with intelligent rules. For example, receive alerts if personally identifiable information (PII) inadvertently ends up where it should not be.

It’s a lot, but Dust’s founders believe most of these streams will ultimately contribute to one coherent product. They’re still in the early days of their exploration and are forming the final focused picture of what Dust will be. Based on their initial iterations, they believe they’ve confirmed their core hypothesis: that knowledge workers can be augmented (not replaced) with LLM applications that have access to company data, and a new kind of “team operating system” can be built for that.

Konstantine Buhler, Partner at Sequoia


Unlocking business data

The “rise of big data” has been happening for over 20 years, and although companies continue to ingest more data than ever, many still struggle to use it to generate insights from their AI models. Data processing and annotation remain the most tedious and expensive part of the AI process but also the most important for quality outcomes. Even with the rise in pre-trained large language models, enterprises need to focus on using their proprietary data (across multiple modalities) to create production AI that leads to differentiated services, insights, and increased operational efficiencies.

Labelbox solves this challenge by simplifying how companies feed their datasets into AI models. It helps data and ML teams find the correct data, process and annotate it, push models into production, and continuously measure and improve performance.

Labelbox’s new platform takes advantage of the generative AI movement. Model Foundry allows teams to rapidly experiment with AI foundation models from all major closed and open-source providers enabling them to pre-label data in just a few clicks and rapidly experiment. In doing so, they can learn which model performs best on their data. Model Foundry auto-generates detailed performance metrics for every experiment run while versioning and snapshotting outcomes.

Its impact can be profound. Traditionally, humans take days to complete a straightforward but time-consuming task like classifying e-commerce listings with multiple paragraphs of text. With GPT-4, however, that task can be performed within hours. Model Foundry allows businesses to discover these efficiencies themselves.

This is far from the only example. Early results show that over 88% of labeling tasks can be meaningfully accelerated by one or more foundation models. Instead of coding and building pipelines to feed your data to models, Labelbox enables anyone to pre-label data with a few clicks. It is built to empower teams to work collaboratively and draw in cross-functional expertise to maintain human supervision for data quality assurance. This functionality democratizes access to AI by allowing ML experts and business SMEs to easily evaluate models, enrich datasets, and collaborate to build intelligent applications.

Labelbox has proven to significantly reduce costs and increase model quality for many of the world’s largest enterprises, including Walmart, Procter & Gamble, Genentech, and Adobe.

For enterprises, the race is now on to unleash the power of these foundation models on their proprietary data to solve business problems. We are excited to see how Labelbox will help businesses unlock their data and deliver better products at much higher efficiencies.

Robert Kaplan, Partner at SoftBank


A new creative suite

AI is everywhere and increasingly becoming a commodity. In most cases, companies have incorporated AI as chatbots to enrich existing applications. It’s rarer to find AI applications that reinvent product experiences, leveraging the technology to fundamentally change how we interact with products – analogous to how Google’s search engine changed how we browse the internet or how Instagram transformed how we share photography from mobile phones. These AI applications require a deep understanding of existing user experiences, visionary product thinking, and cutting-edge technology.

Runway is a leading example of a company doing this – leveraging applied AI research to reimagine the creative experience and build a new creative suite. (Lux has been lucky to partner with the company from its earliest stages.)

  1. Deep understanding of UX. Founders Cristobal Valenzuela, Anastasis Germanidis, and Alejandro Matamala-Ortiz were researchers at NYU’s Interactive Telecommunications Program with years of design experience. Runway’s team understood the creative tooling ecosystem from direct experience and the barriers to democratization. For example, creative film production often requires expensive machines, software resources, and a high level of training. As a result, it has historically been concentrated in major production studios. Runway saw an opportunity to broaden and improve accessibility to the creative tools needed.
  2. Visionary product thinking. Runway was early to recognize that an inflection point in AI could dramatically improve the user experience, going beyond augmenting existing creative tools to fundamentally changing how those tools work. For example, users can leverage simple text prompts to create brand-new video content from scratch. Critically, this video is professional-grade and can be shared from a desktop or mobile device. Regardless of skill level, background, or resources, Runway can save hours or days of editing labor. It is a visionary product capable of turning simple prompts into vivid, moving life.
  3. Leading AI technologists. Runway did not just solve a problem with a visionary product – they also reimagined the underlying research and tech infrastructure. Runway’s in-house research organization is at the forefront of advancements in deep neural networks for image and video synthesis. The company developed Gen-2, a multi-modal AI video model more powerful and capable than anything on the market today. It was the first publicly available model capable of turning text into video. Before that, Runway released Gen-1, a model leading a paradigm shift in video generation tools that produce high-quality outputs. Runway’s researchers also pioneered the text-to-image model, Stable Diffusion.

Since October 2022, Runway has developed more than 30 AI “Magic Tools” across video, image, 3D, and text that serve every aspect of the creative process, from pre-production to post-production. Their customer base includes Fortune 500 and Global 2000 companies like CBS’s The Late Show with Stephen Colbert, New Balance, Harbor Picture Video, Publicis, and Google. The platform has also been used to edit Oscar-nominated films like Hollywood hit Everything Everywhere All at Once.

The most exciting AI applications transform existing product experiences, rethinking how users interact with products. With Runway, users can spin up new video creations in seconds, regardless of whether they’re a first-time videographer or a professional production studio. It’s a transformative shift and an example of how AI reimagines different industries.

Grace Isford, Partner at Lux Capital


Rewiring cellular fate

Cells are the most complex computer systems on the planet. Like computer chips, DNA is composed of elementary units working in superposition to produce complex functionality. Unlike bit-based code, atom-based code is stochastic and hierarchical. Systems depend on systems depend on other physical systems – each one being affected by heat, acidity, and the molecules in a cell’s microenvironment.

Despite these interdependencies, cellular machine code (DNA) can run different programs efficiently. Even though your liver cells and skin cells contain the same genome, these cell types look, feel, and function differently. Why? They’re executing different epigenetic programs constituted by which genes are dialed up and which are dialed down and to what degree.

In 2006, Takahashi et al. used a combination of four transcription factor (TF) proteins to reprogram a mature cell back into a stem cell, seeding the field of epigenetic reprogramming. Using the earlier analogy, TFs are proteins that turn a gene’s dial up or down, essentially changing the “program” being run. Takahashi and Yamanaka’s discovery led to induced pluripotent stem cells (iPSCs) and garnered a Nobel Prize. Since then, many research groups have applied unique TF combinations to alter cell state, rejuvenate damaged cells, and restore youthful cellular phenotypes.

While epigenetic reprogramming has become more tractable, it’s certainly not trivial. Groups must discern what combination of TFs will efficiently drive a cell from State A to a desired State B. Future TF cocktails may allow us to transform a diseased cell into a healthy cell, for example, enabling a new class of medicines. Ultra-large reprogramming screens are required since the exact TF combinations are still unknown for many applications. With more than 1,500 native human TFs, even five-member cocktails would yield an experimentally infeasible >6x10^13 combinations – necessitating a more efficient search method. We believe NewLimit is engineering such a method.

Spurred by advances in single-cell sequencing and machine learning (ML), NewLimit is transforming a previously artisanal discipline into a data-driven science. The company has a healthy split between molecular and computational biologists, laying the cultural foundation necessary to construct an increasingly efficient closed-loop platform. Combining expert know-how and multi-modal readouts (scRNA-Seq, scATAC-Seq, etc.), NewLimit aims to discover therapeutic reprogramming factors to treat previously intractable diseases.

With each round of experimentation, NewLimit employs ML techniques to:

  1. Combine and compress multiple assay readouts into a lower-dimensional optimization space inclusive of a cell’s current State A and its desired State B.
  2. Enumerate new TF combinations likely to drive a cell toward its desired state along that optimization space.
  3. Suggest what types of data would help improve the model and when/where to apply more expensive, lower throughput experimental methods.
  4. Nominate what changes should be made to the platform to maximize the amount of useful information generated per dollar.

Beyond its stellar team, technological prowess, and ambitious vision, we admire NewLimit’s pragmatism. While the company hasn’t publicly shared the details of its initial commercial strategy, we believe the approach is creative, reasonably de-risked, and potentially transformative for humanity. The founding team is united in the understanding that platform biotechs may be likened to expensive science projects without near-term asset generation. To that end, NewLimit is transparent, having cataloged its technical progress since inception.

We should be humbled by nature’s complexity. To be sure, biology is harder to program than silicon devices of our own design. Dimension aims to empower pioneering entrepreneurs, like NewLimit’s, that seek to test the boundary of what’s possible at the interface of technology and biology.

Simon Barnett, Research Director at Dimension


Foundational AI for software development

While OpenAI has been focused on general-purpose AI and DeepMind has been focused on scientific discovery, a third fundamental use-case of AI is understanding and creating software.

GPT-4 is ingraining itself in experienced and novice developers’ workflows. But this paradigm shift is still in its infancy. Extrapolating from the last few months, AI-assisted programming will soon become ubiquitous. Taking this trend further into the future, natural language will become the abstraction upon which software is built.

While other companies have released large code-only models like StarCoder, no approach has yet found performance close to that of GPT-4. And I suspect this is because training a model only on code cannot produce great software development ability. This view is how I came across poolside. The company was founded by Jason Warner, former CTO of GitHub, and Eiso Kant, former founder of source{d} – the first company in the world to pursue AI for code.

What’s unique about poolside is that they’re taking the OpenAI foundation-model approach but focusing on only one capability: code generation. Their technical strategy hinges on the fact that code can be executed, allowing for immediate and automatic feedback during the learning process. This allows for reinforcement learning via code execution – a compelling alternative to reinforcement learning via human feedback (RLHF) – something Eiso was exploring back in 2017.

While the potential of artificial general intelligence (AGI) to greatly benefit humanity is undeniable, its realization remains distant. But why wait for AGI? By focusing on advancing specific areas of AI, such as software development, we can tear down more barriers to creation. I’m excited to watch the poolside team make good on this vision of building dedicated software foundation models.

Matan Grinberg, co-founder and CEO of Factory


France’s OpenAI rival

An explosion of projects in the generative AI space has recently illuminated Paris. Why, you ask? My hypothesis is that it’s home to the largest pool of world-class talent in generative AI that is still outside OpenAI’s event horizon. Among these projects, the most audacious one is undeniably Mistral. Founded by Guillaume Lample, Arthur Mensch, and Timothe Lacroix, Mistral is on a mission to build the best open-source language models with the goal of fostering a thriving ecosystem around them.

I’ve known Guillaume for four years, and we’ve both been deeply involved in applying Large Language Models (LLMs) to mathematics, especially formal mathematics. We developed a friendly rivalry while working at OpenAI and Meta. Guillaume is one of the most talented researchers I’ve had the chance to work with, and I had the pleasure of witnessing his journey from conducting research at Meta to founding Mistral. During that process, I also got to know Arthur Mensch. I’ve always been impressed by his work, particularly with Chinchilla, which redefined what it meant to efficiently train a large language model, and RETRO, an approach to retrieval augmented language modeling that’s still grossly under-explored if you ask me.

Now, let’s delve into what makes Mistral, Mistral. The startup’s vision is to build an ecosystem hinged on first-class open-source models. This ecosystem will be a launchpad for projects, teams, and eventually companies, quickening the pace of innovation and creative usage of LLMs.

Take reinforcement learning from human feedback (RLHF) as an example. Typically, conducting RLHF is time-consuming and, as a result, costly. It involves manually “labeling” actions an AI takes, which can require significant work. This effort is only worthwhile if an AI model is promising enough to justify it. For large businesses like OpenAI, it makes sense to invest in the process – the company has the resources to make it work. But traditional open-source communities usually need a “leader” to step forward and take on that mantle.

Mistral has the opportunity to do exactly that, investing in conducting RLHF on open-source models. By doing so, Mistral would open the door for a Cambrian explosion of innovation. Open-source developers would have access to well-labeled models that they could tweak and tailor to suit different needs. The ultimate winner would be the broader market, which would have access to many more specific, compelling use cases than could be generated by a closed company alone.

Whoever has the best open-source models is well-positioned to attract interest and value. My money is on Mistral because the team is aggressively pushing the efficiency/performance frontier. The talent aboard is also by far the best worldwide when it comes to cracking this.

Mistral has secured the team and the resources to execute against this initial vision and develop better models that can hot-swap other open-source models. It has also secured partners to evaluate these models in enterprise-grade use cases. Keep your eyes on Mistral – they’re poised to give OpenAI a run for its money.

Stanislas Polu, co-founder of Dust


Smarter industrial robots

We’ve often heard predictions that AI and robotics will augment or automate manual tasks in the long term. Today, this is increasingly becoming an urgent commercial imperative.

By 2030, Europe’s working-age population is forecast to drop by 13.5 million, and labor costs are rising at their fastest rate in over two decades. With the rise of e-commerce, the strain on warehouses is greater than ever, and it’s becoming challenging for businesses to remain competitive.

Fifty-five percent of the expenses in operating a warehouse come from order picking, but it’s bleak for companies looking to move to an automated system. The universe of slick applications we associate with AI-first SaaS, or the proliferation of open-source offerings we see in other parts of the ecosystem, hasn’t reached robotics yet.

Instead, businesses looking to automate pick-and-pack are faced with choosing expensive, inflexible robotics solutions. They must navigate a universe of proprietary interfaces that require significant programming time and expertise. The systems also struggle to cope with changing product portfolios, require regular human intervention, and perform poorly with corner cases.

Sereact solves these challenges. Its software is underpinned by a powerful simulation environment that trains the robot arm to understand the spatial and physical nuances of any potential real-world setting. The system is then optimized via continuous learning from real-world data after deployment. It also means they can navigate the challenges of gripping traditionally challenging items such as electronic devices, textiles, soft fruit, tiles, and wood.

Most excitingly, their robotics stack uses large language models (LLMs) to enable intuitive natural language control of robots. They have developed a transformer model called PickGPT that allows users to give instructions and feedback to robots through speech or text. This makes asking the robot to execute a desired task accessible to anyone, regardless of their level of technical knowledge.

Sereact combines its co-founders’ twin areas of expertise. CEO Ralf Gulde has conducted research at the intersection of AI and robotics, while CTO Marc Tusher has specialized in deep learning. The two developed a track record of peer-reviewed research in these subjects at the University of Stuttgart, one of Germany’s most prestigious universities for automated and industrial manufacturing.

Despite being a young business, Sereact has already attracted an impressive range of partners, including Daimler Truck, Schmalz, Zenfulfillment, Zimmer Group, and Material Bank. This points to the huge potential market opportunity in the pick-and-pack industry.

As well as the obvious applications in warehouses for e-commerce – whether picking orders or depalletizing boxes – there is a range of other use cases. For example, in traditional manufacturing, there is a time-consuming process called kitting, which involves the painstaking collection of the fine components required for assembly. Historically, robotic arms have struggled to grip small components or to pick out individual parts in a jumbled environment. Sereact’s software can identify these components and choose the right gripper to pick them out.

The team at Sereact combines technical brilliance with a razor-sharp understanding of their customers’ operating context and a real desire to help overcome labor shortages and operate efficiently and continuously. As the first people to turn the combination of LLMs and pick-and-pack from an academic possibility to real-world impact, I’m confident in their ability to execute and scale a real robotics challenger.

Nathan Benaich, General Partner at Air Street Capital


The tailored LLM engine

Every enterprise is trying to build AI into their business right now. The largest companies in the world recognize its potential, with 20% of CEOs in the S&P 500 mentioning AI in their Q1 earnings calls. Large Language Models (LLMs) can dramatically increase business efficiency by accelerating core functions like customer support, outbound sales, and coding. LLMs can also improve core product experiences with AI-based assistants to answer customer questions or create entirely new generative AI workflows that delight customers.

Given that large companies often lag in adopting new technologies, we’ve been surprised at how quickly enterprises have started building with AI. What hasn’t been surprising is that many enterprises want to build their own AI models and solutions in-house. Every enterprise has a treasure trove of proprietary customer data, often part of its core business’s moat. These enterprises perceive risk in sending their most valuable data to a foundation model API or a new startup with uncertain reliability. Even if data privacy wasn’t a concern, public LLMs, such as GPT-4 or Claude, are trained entirely on open data, so they lack customization to an enterprise’s specific use cases and customer base.

Some tech companies, like Shopify and Canva, have spun up “AI tiger teams” internally to build AI into every part of the business where it fits using off-the-shelf open-source models. However, most companies do not have the resources or experienced AI researchers to build and deploy private LLMs on their own data. They recognize this AI wave could be a transformational moment for the future of their business, yet until now, they couldn’t capitalize on or control their own AI development.

That’s why we’re incredibly excited by what Sharon Zhou, Greg Diamos, and their team are building at Lamini. Lamini is an LLM engine that makes it easy for developers to rapidly train, fine-tune, deploy, and improve their LLMs with human feedback. It offers a delightful developer experience that abstracts away the complexities of working with AI models, and, more importantly, it allows enterprises to build AI solutions on top of their own data without hiring AI researchers or risk data leaving their private clouds. We first partnered with Sharon and Greg last fall. Since then, we’ve had the chance to support this incredibly technical and customer-obsessed founding team executing an ambitious vision to transform how enterprises adopt AI.

Concretely, deploying a private LLM with Lamini offers a wide range of benefits versus using public solutions. Having the in-house engineering team handle the build process guarantees data privacy and better flexibility in terms of LLM selection and the overall compute and data stack. Using Lamini also produces models with reduced hallucinations, lower latency, reliable uptime, and lower costs than off-the-shelf APIs. These performance enhancements come from core technical insights the Lamini team has built into the product based on decades of research and industry experience around AI models and GPU optimization.

Well-known startups and large enterprises have already started using Lamini to deploy LLMs internally and to their customers, and they have been thrilled by the speed of setup, performance, and reliability. In the future, we believe that every enterprise will have AI in their business and products, but only a few will have dedicated AI teams. Lamini is the startup leveling the playing field and helping all companies harness this transformational technology. And thanks to its recent Databricks partnership, it is now easier than ever for companies to get their AI solutions up and running by setting up Lamini directly on top of their existing Databricks’ data lake and compute cluster.

James Wu, Investor at First Round Capital; Todd Jackson, Partner at First Round Capital


Your coding “droid”

If you want a computer to do something for you today, you have to translate your thoughts into “computer language” – hyper-literal code that a compiler can understand. To become an engineer is to contort your brain to think like a machine. But we are reaching a turning point where AI can translate human language into code. That transition—from human engineers to digital ones—will, in all likelihood, be one of the most important technological inflection points in our lifetime.

We are still in the infancy of this transition. AI agents like BabyAGI and AutoGPT have sparked the public imagination. But while coding assistants like Github Copilot represent a step forward, they are still very limited – mostly acting as autocomplete for ideas already realized in code.

Factory is different. The company was founded in 2023 by Matan Grinberg, a former string theorist, and Eno Reyes, an ML engineer. When I met Matan, I was immediately compelled by his vision of a future where engineers can delegate annoying tasks and focus on tough, thorny questions that make building things fun. To do so, Matan and Eno have created autonomous coding “droids.”

Droids are AI engineers that can handle routine tasks like code review, debugging, and refactoring. Unlike existing products, Factory’s droids are hands-off – they can review code, address bugs and answer questions independently. You can also work your Droid like a junior developer, using them to brainstorm and offload feature work. Droids have strong guardrails in place: they target their intelligence towards users’ needs and are less prone to “hallucinate” wrong answers.

Code generation is set to be one of the most transformative areas of the AI revolution. And Factory has all the necessary tools to succeed.

  1. Team. Matan, Factory’s CEO, cut his teeth as a string theorist at Princeton, where he imagined the singularity of a black hole. Eno spent his career as an ML engineer at Hugging Face, dealing with tedious engineering processes himself. This is a one-of-a-kind team.
  2. Practicality. While droids can’t perform as well as human engineers quite yet, they are uniquely suited to the tasks engineers hate. Give them your boring, repetitive work.
  3. Speed. Factory has built something wonderful in just a few months. While others imagined AI engineers, Matan and Eno got to developing them. They are rapidly improving what is already an exceptional product.

The human story is one of offloading repetitive work, allowing us to move on to more complicated tasks. When humans invented agriculture, it freed our energy to build cities. After the Industrial Revolution, we built rockets that took us to the moon. The next generation of task-saving will be digital – freeing humanity from online drudgery to further push the technological frontier.

What will we build next, when the only limit is our imagination?

Markie Wagner, founder and CEO of Delphi Labs

The Generalist’s work is provided for informational purposes only and should not be construed as legal, business, investment, or tax advice. You should always do your own research and consult advisors on these subjects. Our work may feature entities in which Generalist Capital, LLC or the author has invested.