At nvp capital, we back founders building category-defining companies in markets that are large, underserved, and on the verge of a structural shift. Today we’re sharing our investment in Human Archive a YC W26 company modeling human embodied intelligence. We co-led the seed round, alongside Wing VC, and met CEO Raj Patel through the Berkeley engineering network.
The Problem: Physical AI Is Starving for Data
The next generation of AI is not being bottlenecked by model architecture or compute. It is being bottlenecked by real-world data.
Language models were unlocked when the internet provided trillions of tokens of structured human knowledge. Physical AI — robotics, world models, embodied agents — faces a gap of an entirely different order. The global estimate for robot manipulation data is roughly 300,000 hours. Internet video sits at one billion. The bottleneck is not the models — it is the data that would let them reason about the physical world.
The prior generation of approaches — teleoperation, open-source research aggregation — proved structurally unscalable or commercially unviable. What the field needs is purpose-built data infrastructure: real-world, multimodal, at scale. That is what Human Archive is building.
The Company
Human Archive captures the full sensorimotor stack of human interaction — egocentric RGB, stereo depth, tactile force, motion capture, and wrist/chest cameras — all synchronized to sub-10 millisecond accuracy. The key distinction is perspective: not what a robot sees and senses, but what the human sees, feels, and does.
As Raj puts it:
“The physical world is the last frontier for AI. Language had the internet. Physical AI needs its own data layer — one that captures not just what a robot sees, but what a human sees, feels, and does. That’s what we’re building.”
Rather than collecting data in labs, Human Archive deploys hardware rigs into active work environments through national labor service partnerships across the world, in homes, hotels, restaurants, agriculture, industrial, construction, retail, and other real-world environments.

The Team
The Human Archive team is built to solve this problem. Raj Patel, CEO, brings the rare combination of technical credibility and the commercial instincts to close contracts — his data work at Amazon, Microsoft, and Greenlight gave him direct experience with enterprise data pipelines and what it takes to deliver data products at scale. Shloke Patel, his cousin and co-founder, published hardware papers out of Stanford’s Shape Lab directly relevant to the multimodal synchronization challenge. COO Rushil Agarwal is based full-time in Bangalore running the on-ground operation and 120+ national partnerships that make the India network work. CTO Samay Maini brings hands-on ML and robotics experience from Amazon and Contoro Robotics. Four co-founders, two decades of shared history, and a skillset that maps almost perfectly onto the problem.
Why Now
Physical AI has attracted over $10B in venture capital — Physical Intelligence, Skild AI, Figure, 1X, Apptronik, NEURA Robotics — and deployment is just beginning. As these companies scale, their data requirements will compound. Every new task domain, environment type, and robot embodiment requires fresh training data.
Vision-only egocentric data is already getting commoditized. India-based operators are scaling it at high volume and low cost. The companies that own the tactile and force layer will command pricing power as dexterous manipulation becomes the frontier. The labs choosing their data partners now are running early pilots, and the providers that deliver quality and reliability on those first contracts will lock in the relationships that define this market.
Scale AI, Mercor, and David AI demonstrated that data businesses with a product that is hard to replicate and sticky once embedded can generate hundreds of millions in revenue. Human Archive is building that kind of product in a category where no comparable commercial dataset exists.
We believe the window to establish that position is now, and we’re excited to be on the journey alongside Raj, Shloke, Samay, and Rushil. Welcome to the nvp portfolio, Human Archive!
FAQ: Physical AI Training Data and Human Archive
What is physical AI training data?
The real-world sensory information — video, depth, force, motion — used to train AI models that power robots, world models, and embodied agents. Unlike language models trained on internet text, physical AI requires data that captures how humans interact with and move through the physical world.
Why is there a shortage of physical AI training data?
No internet-scale equivalent exists for physical AI. The global estimate for robot manipulation data is roughly 300,000 hours — compared to one billion hours of internet video. Collecting synchronized multimodal data from real-world environments is operationally complex and expensive to replicate at scale.
What is a sensorimotor dataset?
A dataset capturing the full loop of human perception and action: what a person sees (egocentric video), feels (tactile force and depth), and how their body moves (IMUs and wrist cameras). More valuable than vision-only data because it teaches models how to interact with the world, not just what it looks like.
What is Human Archive building?
The foundational multimodal sensorimotor dataset for embodied AI and robotics. The company deploys proprietary hardware rigs into real work environments through labor service partnerships in India, capturing synchronized egocentric RGB, stereo depth, tactile force, and body IMU data at commercial scale — sold to frontier AI labs and robotics companies as training infrastructure.
How is Human Archive different from other physical AI data companies?
Most competitors collect only one modality of data. Human Archive collects the full sensorimotor stack — including tactile, motion, depth, and wrist/chest cameras — synchronized to sub-10 millisecond accuracy across 600+ real-world environments. No commercial provider currently offers this combination at scale.
Why is nvp capital investing in physical AI data infrastructure?
The frontier labs training the next generation of models need high-quality real-world sensorimotor data. The companies that lock in those supply relationships early will have durable, compounding advantages. Human Archive has the team, the hardware, and the operational model to be that company. Read more on why nvp capital believes physical AI is approaching its ChatGPT moment here.