- OpenAI’s ChatGPT, which creates lifelike responses to text prompts, is taking the internet by storm.
- Beneath the buzz, the next-generation developer framework Ray was key in the viral model’s training.
- Ray comes from the $1 billion startup Anyscale and is also the likely framework behind GPT-4.
Another new artificial-intelligence tool has created a firestorm on the internet: a chatbot, called ChatGPT, that provides immensely detailed, near-lifelike responses to almost any question you can imagine. But while ChatGPT and other viral tools like Prisma Labs’ Lensa catch all the buzz, there’s a little-known distributed framework powering this new generative-AI revolution that’s flying under the radar.
Ray, the framework from the A16z-backed startup Anyscale, was key in enabling OpenAI to beef up its ability to train ChatGPT and models like it. Ray operates under the hood for all of OpenAI’s recent large language models — and it’s also the likely framework behind OpenAI’s highly anticipated next act, commonly referred to as GPT-4. Industry insiders think it could create a new wave of billion-dollar businesses by generating near-human-like content.
Ray is already earning top marks in the field. Before deploying it, OpenAI used a hodgepodge of custom tools to develop early models and products. But as the weaknesses became more apparent, the company made the switch to using Ray, OpenAI’s president, Greg Brockman, said at the Ray Summit earlier this year.
Lukas Biewald, the CEO of Weights & Biases, which helps companies track machine-learning experiments and is considered a hot rising star in the AI world, said his company’s most forward-thinking customers loved the product — OpenAI included. This makes him think Ray is promising, he said.
“The idea that you could run the same code on your laptop and on a huge distributed set of servers is a huge deal, and the importance of it increases as models get bigger,” Biewald told Insider. “I think the devil is in the details, and they appear to have done a good job with them.”
A billion-dollar bet on Ray
Anyscale has proved to be such a prized commodity that Ben Horowitz, of the Andreessen Horowitz and A16z namesake, is a board member. Its most recent round, an extension of its Series C that valued it at more than $1 billion, closed in a matter of days, people with knowledge of the deal said.
Some investors described Anyscale as Horowitz’s hopeful “next Databricks” — an apt description given one of the startup’s founders, Ion Stoica, was a cofounder of the $31 billion data giant.
“AI is incredibly fast-moving and new approaches and people are trying new approaches all the time,” Anyscale CEO Robert Nishihara told Insider. “ChatGPT combined a lot of the previous work on large language models with reinforcement as well. Underlying this you need to have infrastructure that enables that flexibility and innovate quickly and scale different algorithms and approaches. A lot of the flexibility Ray provides comes from the ability to use both tasks and actors in Python.”
Because these buzzy new tools like ChatGPT require increasingly massive models, companies have had to rethink the way they develop them from the ground up. Ray fills that gap, making it easier to train these colossal models and possible to include the hundreds of billions of data points that give every response a quasi-lifelike feel to it.
How Ray become a go-to tool for machine learning
Ray provides an underlying infrastructure that manages the complex task of distributing the work of training a machine-learning model. Machine-learning experts can often run small models that use limited sets of data — say, a model to predict whether a customer will stop buying a product — on their own laptop. For something like ChatGPT, however, a laptop isn’t going to cut it. Instead, those models require an army of servers to train their tools.
But one of the biggest challenges is to orchestrate that training across all those different pieces of hardware. Ray provides a mechanism for managing disparate pieces of hardware as a single unit for a programmer — determining what data goes where, how to handle failures, and others. Ray extends a key programming concept, “actors,” in other languages to Python, the language of choice for machine learning.
Sometimes it’s not even the same hardware — and can contain a mix of products like Google Cloud, AWS, and others working on the same problem.
Before deploying Ray, OpenAI used a hodgepodge set of custom tools built on top of “neural programmer-interpreter” model. As the company scaled, it found itself creating new custom tweaks to its developer tools and infrastructure, said Brockman, the OpenAI president.
“It was the bare-minimum investment we could make and still not be unhappy,” Brockman said, at the talk, about deploying NPI models. “If something is not your core competence, you think, ‘Why am I shuffling around the bits and dealing with a TCP stream with pickles in it?’ That’s not our burning passion.”
Tapping Ray removes that immense layer of complexity, opening up more time and energy for a company like OpenAI to focus on their key competency.
A new generation of AI demands new developer tools like Ray and JAX
Ray is just one in a series of rapidly emerging next-generation machine-learning tools that are quickly upending the way development happens. Google’s JAX framework, for instance, is also gathering enormous traction. Many expect JAX to become the backbone of Google’s core machine-learning tools, as it’s already achieved widespread adoption in its DeepMind and Google Brain divisions.
It also isn’t the only tool focused on a problem like this. Another startup backed by FirstMark Capital and Bessemer Venture Partners, Coiled, develops a framework called Dask to manage distributing this problem.
All these tools, Ray and JAX included, are in service to a new generation of combustion engines for the internet called large language models. These tools, trained on billions of data points, try to predict the structure of sentences and responses and spit out lifelike text responses to inbound queries. Multiple companies, both startups and giants, are building their own large language models including Meta, Hugging Face, OpenAI, and Google.
“It is profoundly important to understand how difficult it is to break up work (large models) and spread it across lots of little chips,” Andrew Feldman, the CEO of the AI chip startup Cerebras Systems, told Insider. “It’s a punishingly difficult problem across the board.”
#ChatGPT #buzz #beneath #developer #framework #quietly #fueling #era #lifelike #OpenAI