Understanding the physics-aware systems Nvidia is working on

20
Jan 25
By | Other

As just part of what came out of this year’s CES earlier this month, Nvidia announced its development of something called Nvidia Cosmos.

The name itself doesn’t tell you much, invoking something vast – the celestial sky, or the cosmologies we humans tell ourselves to explain the origin of everything.

So what is this system?

Nvidia describes Cosmos as “a state-of-the-art generative world model platform” and defines world model models as “neural networks that simulate real-world environments and predict accurate results based on input of text, image or video”.

World models, the spokespersons explain, “understand” real-world physics. They support the development of robotic systems and autonomous vehicles, and other physical structures that can follow the rules of the road, or the demands of a workspace. In a way, these are the engines for the advent of physical entities that will think, reason, move and eventually live like humans.

A Technical Dictionary

Nvidia’s folks also detail other aspects of Nvidia Cosmos, including “advanced tokenizers that help break down higher-level data into usable chunks.”

For reference, here’s how ChatGPT describes an advanced tokenizer: “Advanced tokenizers go beyond simple whitespace or rule-based segmentation to produce subtext, byte-level, or hybrid segments that better handle rare words, text multilingual and domain specific vocabulary…. These ‘smart’ tokenizers are an essential foundation for modern NLP systems, enabling models to scale to massive datasets and diverse linguistic inputs.”

These templates will be available under an open license to help developers work on whatever they’re doing. A press release of Jan. Nvidia explains:

“Physical AI models are expensive to develop and require large amounts of data and real-world testing. Cosmos’ World Foundation Models, or WFM, provide developers with an easy way to generate massive amounts of photorealistic, physics-based synthetic data to train and evaluate their existing models.

Despite understandable concerns about jailbreaking and hacking, companies are likely to be excited to have this opportunity to build on what the leading US tech company has created.

Next is the data curation process, where Nvidia NeMo will provide an “accelerated” process.

Anyway, TLDR: These are “physics aware” systems. They sound like essential parts of the applications that will bring AI to “walk among us,” to act in our lives, rather than just being plugged into a computer somewhere. What will our robot friends look like? And how will we treat them, and them? These are the kinds of questions we will have to consider as a society.

Nvidia Cosmos: A Case Study

When I read the list of companies that have already adopted Nvidia Cosmos technology, most of them were unknown. But one was left out:

Ride-sharing company Uber is an early adopter of this type of physics-based AI.

“Generative AI will power the future of mobility, requiring rich data and very powerful computing,” Uber CEO Dara Khosrowshahi said in a press release. “By working with NVIDIA, we are confident that we can help meet the timeline for safe and scalable autonomous driving solutions for the industry.”

That phrase, “safe and scalable autonomous driving,” probably sums up the project well, though, as with self-driving vehicle designs over the past two decades or so, the devil is in the details.

There’s not much else available about exactly what Uber is doing with Nvidia Cosmos. But we can better understand the very framework and context of what Nvidia is doing as a major innovator in these types of systems.

Omniverse

I was also reading about the Nvidia Omniverse platform that the company describes this way:

“A platform of APIs, SDKs and services that enables developers to integrate OpenUSD rendering technologies, NVIDIA RTX™ and generative physics AI into existing software tools and simulation workflows for industrial and robotics use cases .”

So what it sounds like is that the Omniverse platform is more about assessment, monitoring and tool use, in aid of exploring what is possible with the world foundation models themselves.

An inflection point

I’ll end with this quote from CEO Jensen Huang, who reportedly said, “The ChatGPT moment for bots is coming.”

That’s probably the headline here, because we’ve all been wondering when we’ll start seeing these intelligent, physics-aware robots walking among us, or powering truly autonomous vehicles.

The answer seems to be that it will be sooner rather than later.

Click any of the icons to share this post:

 

Categories