2026-05-25 21:00:01

“Social engineering” sounds like something out of a conspiracy thriller, charged with totalitarian control and fringe paranoia. More mundanely, it’s come to be associated with phishing and other scams, in which fraudsters manipulate people into disclosing personal information.
Yet the concept is older and more benign: it is the deliberate shaping of human behavior, often at scale. It predates silicon—and became pervasive, and ungoverned, especially once its practitioners learned to hide it. Authoritarian regimes and more recently scammers and big companies have profited from it. To defend ourselves from bad actors, and to benefit from social engineering’s good side, we need to reclaim the name, and govern it prudently.
In 1894, Dutch entrepreneur Jacques van Marken urged companies to hire “social engineers” to manage human systems such as insurance, education, and profit sharing for workers as carefully as they did mechanical ones. Fifteen years later, reformer William H. Tolman published Social Engineering, describing how U.S. industrialists optimized workers’ conditions alongside manufacturing methods. If industrialists could shape steel and electricity on demand, why not society itself?
By the 1920s, that confidence had spread. The architect Le Corbusier declared that dwellings were “machines for living in,” imagining cities as orderly lattices where people moved like parts on a conveyor belt. Civilization would run like a Swiss watch.
The idea soon darkened. Authoritarian regimes pushed it to extremes, promising to fashion “the New Man.” In Nazi Germany, engineer Fritz Todt founded Organization Todt, a vast state engineering enterprise that emerged from the autobahn highway system and later operated concentration camps using slave labor.
In the Soviet Union, leaders adopted U.S. scientific management techniques to plan factory-worker movements and classify populations through centralized records, feeding both rapid industrialization drives and the gulag system of forced labor. The same tools and managerial methods used to build highways and enact five-year plans worked for repression and mass control.
By the 1950s, “social engineering” had become a contaminated phrase. The revelations of Nazi and Soviet abuses, along with Cold War critiques of grand social planning turned the term from a progressive slogan into a warning label. Banishing the words pushed the practice underground, making it harder to recognize when it resurfaced in new forms—such as organizational psychology and systems management that still relied on classification and behavioral influence techniques but under softer, less loaded labels.
In the postwar years, the new social-engineering lexicon included “human factors” and “urban planning,” all promising integration rather than command. As computing advanced, the language shifted again: “customer journey mapping” to track interactions, “user experience” to script them. Engineering, which began as a means of reshaping physical space, set its sights on shaping behavior. Digital design features embedded in our smartphones now target our attention and desire.
Language helps conceal these modern forms of social engineering. “Data analytics” sounds neutral beside “surveillance.” “Personalization” flatters individuality while still sorting users into predictable categories. “Behavioral nudges” guide decisions without the sense of intrusion. We attach “social” as a favorable modifier to sciences, capital, and media, yet recoil when it meets “engineering.”
That discomfort is a clue. Engineering implies control, and control prompts us to ask who directs whom, toward what ends, and with whose permission.
Not all social engineering these days is hidden. Hackers don’t need to break a firewall if someone hands over their password. Romance scammers cultivate intimacy the way farmers cultivate crops. They succeed not through force but by exploiting trust. If even these obvious attacks work, the invisible kind, with roots in social engineering, are a shoo-in.
Most of the social engineering we encounter is proprietary and beyond our control. Firms build recommendation algorithms tuned to boost engagement and profit with no hearings or right of appeal. Browser and cookie defaults decide what data we surrender. A single autoplay toggle can cost users hours and build unhealthy habits. These are acts of engineering as deliberate as laying a road or redrawing an electoral district. They create a kind of curated itch by which boredom never settles, and satisfaction never arrives. The results are predictable—users click on targeted ads, make purchases, form habits, and lock in opinions.
Consent has transformed along with it. Once straightforward and revocable, it is now subtle and persistent, buried in defaults or opaque terms of service too quickly accepted. You remain free to opt out, much as you are free to refuse roads or electricity. Consent has become the preselected setting of modern life.
When social engineering operated more in the open, citizens could contest it, at least in societies with responsive government. Today’s invisible version diffuses accountability so thoroughly that scrutiny becomes hard to direct. Despite recent congressional hearings on social media’s impact on youth mental health and juries agreeing that firms are knowingly designing algorithms that cause harm, pinpointing responsibility remains elusive. When the mechanism is buried inside a system used by billions, we cannot easily point to a single decision-maker or trace the precise moment of manipulation.
Today’s social engineering is less overt and theatrical than its predecessors. Earlier versions arrived on public posters and loudspeakers for mass audiences. Today’s version is more intimate, delivered through personal devices and constant feeds tailored to the individual. The model succeeds because participation feels like freedom, not control.
Not all social engineering is dystopian. Well-kept parks foster community, accessible buildings extend dignity, vaccines and seatbelts save lives. Even in the digital realm, positive examples exist: browser extensions that automatically block hidden trackers, search engines that refuse to build personalized surveillance profiles, and decentralized social platforms that give users greater control over their own data and feeds.
The term “social engineering” still unsettles, though. But “asocial” engineering, which ignores human consequences entirely, is worse. Recognition of the human dimension to engineering is the beginning of repair. Only by seeing the machinery clearly and naming it honestly can we decide who engineers what and why. The machinery will not dismantle itself. Once named, it becomes subject to choice. That negotiation of purpose, power, and process are the defining political questions of any real democracy. We cannot ensure that social engineering serves and sustains society so long as we dodge the words.
2026-05-25 18:00:01

This webinar presents a workflow offering end-to-end solutions for designing, training, validating and verifying, compressing, and deploying AI-based virtual sensor models to embedded processors within a single environment.
Highlights
2026-05-22 02:00:01

Patients who use mobile applications to manage medical conditions including depression and chronic pain might assume the apps have been evaluated by regulatory agencies to be safe and effective. But that isn’t necessarily the case.
Most of the more than 55,000 medical apps that claim to diagnose or treat a condition—or ones that provide clinical decision support, known as “therapeutic” apps—have never been assessed by any trusted neutral bodies or regulatory agencies to evaluate them for technical soundness, ethical design, or clinical benefit. The apps often don’t comply with regional data security and privacy laws to protect people’s sensitive health information.
Medical apps differ from traditional wellness apps, which provide users with insights into becoming healthier by, for example, tracking fitness activities, monitoring blood pressure, and analyzing sleep patterns.
There is no reliable way to verify that therapeutic apps deliver the results they indicate. To help ensure such apps are credible, the IEEE Standards Association (IEEE SA) recently launched the IEEE Global Medical Mobile App Assessment and Registry. The publicly searchable directory is designed to list apps that have been vetted by experts across several criteria including technical soundness, ethical design, compliance with data security and privacy regulations, and clinical efficacy, which is evidence of a clinical benefit for the patient.
“Patients, clinicians, payers, and health care systems often struggle to distinguish clinically meaningful therapeutic apps from those that are simply well-marketed,” says IEEE Senior Member Yuri Quintana, chair of the assessment and registry program. He is chief of the clinical informatics division at Beth Israel Deaconess Medical Center, in Boston. “Our goal is to establish a standardized review method using criteria developed by experts.”
Because the apps are intended for medical use without being part of a medical implement, they fall under the designation of software as a medical device (SaMD), according to the International Medical Device Regulators Forum. SaMD is supposed to be regulated by public health agencies such as the U.S. Food and Drug Administration, but the apps have developed and grown in popularity so quickly that regulators haven’t been able to keep up, Quintana says. Some companies have received approval, but most have not, he says.
Many users are unaware of the regulatory gap, he says.
“Seeing an app from a well-known company often creates the impression that it has been meaningfully vetted for safety and efficacy, even when that is not the case,” he says.
Some companies are using deceptive advertising to sell their product, he adds. Marketing materials might claim that all of a company’s health apps are certified, even though only one app has been approved by a regulatory body to treat a particular condition. Or the verbiage might imply the company has clinical evidence proving its application works, even though the app has never been tested independently.
Another concern is that updated apps aren’t being vetted, says Maria Palombini, IEEE SA’s director of health care and life sciences global practice lead.
“The original app might have received approval from a regulatory agency, but not the updated version,” Palombini says. “There could have been significant changes from the original.”
“Not every medical-related app triggers the same regulatory classification or review across jurisdictions,” Quintana adds. “That leaves a large gray zone of clinically relevant but lower-risk apps that haven’t undergone an independent assessment. The IEEE registry was created to help fill these gaps.
“IEEE is the best organization to address this problem because this is fundamentally a standards, trust, interoperability, and conformity assessment challenge,” he says. IEEE “is the world’s largest technical professional organization, with deep expertise in developing globally recognized standards including in health care, cybersecurity, AI ethics, and interoperability.”
“Through the IEEE Conformity Assessment Program, we already run rigorous assessment and registry programs,” Palombini says. “Our neutral, consensus-driven, multidisciplinary approach—bringing together clinicians, regulators, developers, and ethicists without commercial bias—makes IEEE uniquely positioned to create trustworthy global guardrails that can scale across jurisdictions and support regulatory harmonization.”
The assessment framework was developed by a multidisciplinary group of 35 volunteer experts from 10 countries, Quintana says. The panel includes academics, AI experts, app developers, clinicians, ethicists, mental health experts, patient advocates, regulators, researchers, technologists, and those who assess safety in health care.
The registry is for any app used for clinical care or therapeutics that claims to demonstrate a medical benefit. That includes apps designed for cardiology, diabetes, mental health, neurology, oncology, rehabilitation, and respiratory diseases, Quintana says.
Initially, he says, the focus will be on apps that aim to treat mental health conditions, given the large number of offerings in that area and the registry committee’s expertise.
The submission of apps is voluntary. There is no government mandate that requires a company to use the IEEE registry.
The products will be evaluated against about 150 consensus-based criteria across three major areas:
IEEE charges a nonrefundable submission fee that covers the cost of the assessment plus the registry’s annual subscription for the first year.
Developers first must demonstrate they are a legally established entity before they can complete the app publisher registration form and then submit documentation and attestations about the product.
The IEEE review of an app is estimated to take six to eight weeks, Palombini says. The assessment results will be privately shared with the app publisher, she says, and to be listed in the registry, an app must achieve more than 85 percent compliance in each category.
Upgraded apps must be submitted and reassessed, Palombini says. Similar to how users are notified when an app on their smart devices has , the registry will be notified when listed apps have a new update available, she says.
Applicants who do not pass the assessment are to receive feedback explaining why. They will be given an opportunity to make changes or provide additional documentation, Palombini says.
“It’s a pretty methodological process, with checks and balances,” Quintana says. “We’re being very transparent about the process.”
Approved apps added to the registry receive an IEEE certification badge and submission identifier, which the company can display on its website, app store listings, and marketing materials.
“The badge serves as visible proof that the app has met the independent, consensus-based assessment for clinical value, technical robustness, and ethical design,” Quintana says.
The registry will be publicly available at no cost, he says.
Patients and families seeking safe, trustworthy apps—and payers and insurers evaluating reimbursement potential—will find the registry helpful, he says.
The application website is open. The public registry page does not yet list a specific count of approved apps because assessments are ongoing. Approved apps and their unique identifiers are to be published when the initial reviews are completed.
To learn more, you can watch a webinar recorded in March.
The assessment framework that underpins the registry is supporting the formal recognition of IEEE P3962 Standard for Criteria Assessment Framework f2026-05-21 18:00:02

Discover how the ZEISS Crossbeam 750 FIBSEM sets a new benchmark for precise TEM lamella prep, tomography, and advanced nanofabrication. This delivers better resolution, better SNR, larger usable FOV, and shorter acquisition times. Learn how uninterrupted FIB milling will reduce damage and rework, accelerate time to TEM, and increase first pass success—so your FA, yield, and materials teams make faster, confident data driven decisions.
Join us to discover how the new ZEISS Crossbeam 750 with its see while you mill capability delivers precision and clarity—every time—for demanding FIB-SEM workflows. Designed for extremely challenging TEM lamella preparation, tomography, advanced nanofabrication, and APT‑ready lift‑out, Crossbeam 750 combines a new Gemini 4 SEM objective lens, a double deflector, and a next‑generation scan generator to elevate both image quality and process confidence. You’ll learn how better resolution and better SNR translate into more image detail and shorter acquisition times, while the low‑kV FIB performance enables more precise lamella prep.
We’ll demonstrate High Dynamic Range (HDR) Mill + SEM—an interwoven SEM/FIB scanning mode that suppresses FIB‑generated background. This enables immediate, clean visual feedback, even during nudging the FIB pattern live while milling . The result: confident endpointing with uninterrupted FIB milling and pristine, metrology‑grade surfaces with the lowest possible sample damage.
This session is ideal for semiconductor failure analysists, yield teams and materials scientists seeking faster time‑to‑TEM, higher first‑pass success, and consistent outcomes at low kV. See how Crossbeam 750 empowers you to make earlier stop‑milling decisions, cut rework, and reliably plan turnaround time—so you can move from sample to insight with confidence.
2026-05-21 18:00:02

This sponsored article is brought to you by Wetour Robotics.
A field technician on a wind turbine, harness clipped, both hands on a wrench, needs to send a command to the diagnostic device hanging at her belt. A logistics worker on a loading dock, gloves on, eyes on the pallet, needs to redirect a connected lift. A person using an assistive mobility device on a crowded street wants to nudge it forward without taking out a phone or speaking aloud. None of these moments call for a smarter robot. They call for a smarter way to be heard by the machines that already exist.
The past three years of Physical AI have been a story of remarkable progress on the robot side of the loop. Companies like Boston Dynamics, Figure, and Unitree have advanced actuators, locomotion, and dexterity to a level that would have seemed implausible a decade ago. Google DeepMind’s Gemini Robotics has redefined what vision-language-action models can do in unstructured settings. The trajectory of the hardware and the foundation models is real, and it is accelerating.
But there is another side to this loop, and it has been treated as a solved problem for too long. The interface between humans and machines has defaulted, for 40 years, to three input modalities: screens, buttons, and voice. Each of those assumes the user can stop, look down, and translate intent into structured commands. That assumption breaks the moment the work moves into a real environment. On a turbine. On a dock. On a sidewalk. In any setting where hands are occupied, eyes are committed, or speaking is impractical, the conventional interface stack quietly fails.
Spatial Intent Fusion is the simultaneous processing of three streams of human-centered information, namely spatial position, visual context, and gestural intent: Your body is the interface.
The bottleneck on the human side of the loop is becoming as important as the one on the machine side. And solving it requires a different question. Not how do we make the robot more capable, but how do we let the human participate in the computing system as naturally as the robot already does.
Wetour Robotics is betting that the next architectural leap in Physical AI is not about making the robot more capable. It is about making the human a first-class node in the computing network, with the same kind of low-latency, high-fidelity participation that connected devices already enjoy.
Wetour Robotics’ engineers frame the problem this way: a wristband that recognizes a gesture is not enough. A camera that recognizes a scene is not enough. The information a human carries about what they are about to do is distributed across multiple channels, including where their body is in space, what their eyes are attending to, and what their muscles are preparing to do, and any single channel observed in isolation is ambiguous. Reconstructing intent reliably means fusing those channels at the operating system level, with latency low enough that the loop feels closed rather than mediated.
This approach has a name. Wetour Robotics calls it Spatial Intent Fusion: the simultaneous processing of three streams of human-centered information, namely spatial position, visual context, and gestural intent, fused into a single real-time command for any connected physical device. It is the technical implementation behind a simpler positioning statement the company uses externally: your body is the interface.
Orchestra is a portable intelligent hub running the operating system that handles sensor fusion, intent inference, command translation, and safety arbitration. The reference compute platform is NVIDIA Jetson Orin Nano Super, which provides enough on-device inference capacity to keep the entire control loop at the edge, with no cloud dependency on the critical path. Wetour Robotics
Orchestra is not a single device but a layered platform, designed from the start to be sensor-flexible and actuator-agnostic. The architecture decomposes into three perception layers and four coordination engines.
Orchestra itself is the local compute and orchestration core: a portable intelligent hub running the operating system that handles sensor fusion, intent inference, command translation, and safety arbitration. The reference compute platform is NVIDIA Jetson Orin Nano Super, which provides enough on-device inference capacity to keep the entire control loop at the edge, with no cloud dependency on the critical path. Edge inference is non-negotiable for this application. Full-chain latency from biosignal acquisition to actuator command is held under 100 milliseconds, the envelope inside which closed-loop control feels natural rather than laggy.
VisionLink handles visual and spatial perception. Cameras feed into vision models that identify objects, estimate distances, and track environmental context. VisionLink is designed not as a passive recognition layer but as a real-time command generator: its outputs feed directly into Orchestra OS to be fused with biosignal data.
Conductor is the biosignal pipeline. It ingests raw surface electromyographic (sEMG) data from a wrist-worn device, classifies temporal patterns into discrete gestures or continuous control signals, and outputs actuator commands. The technically interesting property of sEMG for this use case is that the signal precedes visible motion. Motor unit action potentials appear at the skin surface roughly 50 to 80 milliseconds before a finger completes the corresponding gesture. Wetour Robotics calls this property pre-motion intent sensing, and it is what allows Orchestra to anticipate user intent rather than react to it.
On top of the three perception layers, Orchestra OS runs four coordination engines. The Perception Engine ingests and normalizes raw sensor streams. The Intent Engine performs Spatial Intent Fusion across modalities, resolving what the user is trying to do given where they are, what they are looking at, and what their hand is signaling. The Orchestration Engine translates intent into device-specific command sequences for any connected actuator. The Safety Engine arbitrates conflicting commands, enforces operational envelopes, and gates execution against runtime safety conditions.
Wetour Robotics
No system that bridges the human body and the digital world is finished. Three engineering challenges remain open, and the company addresses each with a deliberate trade-off rather than a claim of having fully solved it.
Baseline stability of sEMG under motion. In a stationary user, continuous gesture recognition from sEMG is reliable. Once the user is walking, climbing, or otherwise moving, motion artifacts and electrode drift degrade the signal in ways that are difficult to fully compensate for. Rather than overpromise on continuous control in dynamic settings, Orchestra defaults to a smaller set of robust discrete gestures in complex operating environments, and reserves continuous control modes for contexts where the signal-to-noise ratio supports them.
Miniaturization of edge AI compute. Running the Orchestra control loop entirely at the edge requires real on-device inference, which has historically meant trading off between compute capacity, battery life, and form factor. Wetour Robotics’ approach has been a compact carrier board paired with a thermal design and a battery module sized for all-day wearability. The result is a hub that travels with the user rather than tethering them to a desk, and that performs the full perception-to-actuation loop without offloading to the cloud.
Heterogeneity of third-party device protocols. The actuator side of the loop is a fragmented landscape. Different manufacturers expose different command interfaces, different communication stacks, and different safety conventions, and a Physical AI operating system has to integrate with all of them. Wetour Robotics uses an AI-agent layer to negotiate connection and protocol translation adaptively, so that Orchestra OS can ingest data from a wide range of devices, run them through neural network models that infer human intent, and emit the right command on the right protocol for the device on the other end.
The history of computing is a history of interface revolutions. Command lines gave way to graphical user interfaces, which gave way to touch, which gave way to voice. Each transition expanded who could participate in the system and what they could do with it. The next transition is not about a new screen or a new microphone. It is about treating the human body itself as a participant in the computing network, capable of contributing intent at the same speed and fidelity that any other connected node can.
The history of computing is a history of interface revolutions. The next transition is not about a new screen or a new microphone — it is about treating the human body itself as a participant in the computing network.
This path is not a competitor to the work being done on humanoid robots, foundation models for embodied AI, and dexterous manipulation. It is the missing complement to that work. The hardest open problem for humanoid systems is the data: every natural interaction between a human and the physical world is a potential training signal, and most of those interactions are currently invisible to any computing system. As more humans become first-class nodes in the loop, those interactions become observable, structured, and ultimately useful for training the next generation of embodied AI, including the humanoid robots being developed today.
In other words: putting the human back into the computing loop is not just about better interfaces for individual users. It is about generating the kind of grounded, in-the-wild human-machine interaction data that the broader Physical AI ecosystem will need to keep advancing. The robot side and the human side of the loop are not two competing futures. They are two halves of the same one.
That is what Wetour Robotics means when it says: Your body is the interface.
Learn more at wetourrobotics.com.
2026-05-20 19:00:01

Over the next few decades, billions of autonomous, AI-powered robots will work alongside people in factories, perform tedious tasks in warehouses, care for the elderly, assist in unsafe disaster areas, deliver packages and food to our doorsteps, and eventually help out in our homes. Some will look like us, and many won’t. What is certain is that regardless of form factor, robots will all rely heavily on AI in order to deliver real-world value.
In 2025, total investments in robotics companies reached a record US $40.7 billion, accounting for 9 percent of all venture funding. The multibillion dollar question therefore is this: What will it take for AI-powered robots to begin to have a serious economic impact? Many of today’s robotics and AI companies are making bold claims, such as that humanoid robots will soon be coming into our homes, but there’s still a big gap between promise and reality.
The promise of robots that live and work alongside us has been the stuff of science fiction for a very long time. And while many programmers have tried to make that promise a reality, the physical world is just too complicated for traditional computer programs to handle the endless complexity it presents. Thanks to AI, robots are no longer being programmed—instead, they learn to operate in the real world. With enough practice, they can learn to perceive and understand the world around them, reason about that world, and use that reason and understanding to perform tasks that are useful, reliable, and safe.
The two of us have worked at the forefront of AI and robotics for the last decade, as a Professor in Robotics at Oregon State University and Co-Founder of Agility Robotics, and as former CEO of the Everyday Robots moonshot at Google X. Our experience deploying AI-powered robots in real-world settings has given us a perspective on where AI can be used to great benefit in complex robotic systems in the near term and where we are still on the frontier of science fiction. We believe AI will enable an inflection point in robotics advances, but that it will be through the well-engineered application of coordinated systems of different AI tools rather than a single ChatGPT-style breakthrough.
As the excitement around AI is matched only by the uncertainty of what will be possible, here are five hard truths that will define AI in robotics.
For years, we have been seeing videos on YouTube with humanoid robots performing amazing moves on everything from a dance floor to an obstacle course. The inside knowledge in robotics is to “never trust a YouTube robot video.” The gap between real robots that can perform real work in unstructured human environments and carefully scripted and edited robot performances remains significant. The latest performance to get a lot of attention was a martial arts show featuring Unitree humanoid robots performing with children at the Chinese 2026 Spring Festival Gala. While impressive, this falls into a long lineage of tightly scripted robotic performances, where everything has been carefully choreographed and planned in advance. The low-level controls, synchronization, and choreography were stunning, yet the Spring Gala robot performance showed a level of autonomy and intelligence much closer to industrial robots building cars in a factory than something that will show up in your living room any time soon.
Seeing these kinds of demos nevertheless raises questions about where robotics really is. If robots can perform kung fu moves and do backflips and dance, why aren’t they also showing up on factory floors yet? And why can’t they do the dishes in my home after dinner? The simple answer is this: Making AI-powered robots capable of performing general tasks in varied human environments is still really hard. While impressive technological feats like those at the Spring Festival may make it look like we could be very close, the use of AI in these demos is only for low-level motor control (to keep the robots from falling over) and therefore is only a small part of the solution for robots to be general purpose in the real, unstructured spaces where we humans live and work.
Large Language Models (LLMs) like OpenAI’s ChatGPT and Anthropic’s Claude were initially trained on an internet-scale database of text. The world woke up one day in late 2022 to ChatGPT demonstrating that AI computers could suddenly “speak” to us in prose or verse and about seemingly any topic. LLMs have turned out to generalize well and are now able to take multimodal input (text, images, video) and produce multimodal output. Importantly, the corpus of training data was both enormous and human-generated, which are characteristics that form the gold standard for AI training.
The fastest path to robots as part of everyday life may emerge through a range of robot forms performing increasingly sophisticated applications and employing a range of AI tools.Agility Robotics
Giving AI a body (in the form of a robot), so that it can engage with people in the physical world, continues to be a very difficult and broadly unsolved problem. AI models for general-purpose robotics must simultaneously satisfy multiple, often conflicting, physical, geometric, and temporal limitations while operating in unstructured, dynamic environments. In order to generalize, robot models need to be trained on data gathered in a high-dimensional configuration space, where “dimensions” represent text, lighting conditions, degrees of freedom, joint limits, velocities, force, and safety boundaries, just to mention a few. Importantly, this must be good data—it must contain many examples from what amounts to an infinite number of possible configurations in the physical world.
Since there are very few existing sources of data like this, approaches like teleoperation, video analysis, motion capture of humans, and self-exploration in simulation and in the real world are all seen as important ways to collect data. It’s a herculean task. For example, at Everyday Robots at Google X, we ran 240 million robot instances in our simulator over the course of 2022 to collect training data, mostly to train a trash-sorting model. Similar amounts of data will be needed for every skill to get to a similar level of capability, which is not yet human level.
We are far away from a moment where a single AI model might allow general-purpose robots to live and work alongside us.
General-purpose robots can have wheels or legs. They can have one, two, three, or more arms. Some have propellers and can fly, while others may be designed to operate under water. Some will drive on busy roads. The physical world is infinitely varied and complex. And then there are all the people and other animals that will be surrounding the robots. How do you train a model to operate a robot safely and reliably in all of these settings? The simple answer is: You don’t. At least not for quite some time.
We believe the winning AI architecture leading to the next big breakthroughs in general-purpose robotics will be “agentic AI” for robots, which are high-level coordinating models that can reason, plan, use tools, and learn from outcomes to execute complex tasks with limited supervision. Agentic, high-level models running on robots will invoke a system of specialized ones for different types of tasks. We will likely soon see multiple robots collaborating and coordinating with each other through their onboard agentic AI models.
AI tools are unlocking new and powerful capabilities in robotics, which in turn will enable new solutions and new markets. It’s encouraging to see these new models being made broadly available, some even as open-source solutions. This availability is akin to what happened with the internet: Real progress occurred when it became ubiquitous. We anticipate an inevitable democratization of complex behaviors in robotics with wide access to these AI tools and technologies.
Robots are complex systems with many parts that all need to work together with great precision. For a robot to be useful and safe, every part of it must be coordinated, from its perception systems to the computer controlling it, all the way down to its individual actuators.
Actuators—that is, the motors and gears—are a good example of an important part of the robot where what got us here won’t get us there. The actuators used at scale by most industrial robots will not work for robots that will operate in human environments. If these robots accidentally collide with an obstacle, the resulting impacts are harsh, forces are high, and things break. Humans don’t move in this way. We are far more compliant in how we interact with the world, and we’re constantly making contact with our environment and using that contact to help us accomplish things.
Consider the challenge of inserting a key in a lock: Humans typically don’t do this by aligning the key perfectly with the keyhole. Instead, we just feel for the edge of the keyhole and jiggle the key in. Robots need to be able to operate in novel ways to achieve comparable capabilities by using a new class of actuators that are sensitive to force and able to have a compliant interaction with the environment. While these kinds of actuators do exist, they are not yet generally available at scale for robot systems designed to operate around people.
There’s a big difference between tasks that look impressive and real-world tasks that provide value. Robotics is a perfect example of Moravec’s paradox, which states that tasks that are hard for humans are easy for computers (like multiplying two big numbers), and tasks easy for humans (like a toddler’s movements) are extremely difficult for computers and robots.
Serving customers is an unforgiving reality check, because customers only care about solving the real problems they have. If we are to deploy AI-based robot solutions, they must outperform the way things are currently done while demonstrating reliable performance metrics and safety. Agility Robotics’ early work to deploy our humanoid robot Digit in customer locations led to the realization that our first obstacle was safety: Robots that balance and manipulate objects in human spaces bring new types of risk to the workplace. In the first humanoid deployments, physical barriers were necessary, and Agility kicked off a multi-year engineering effort to solve the safety challenge, touching nearly every aspect of robot design and relying heavily on new AI-based approaches to human detection and behavior control.
Everyday Robots at Google deployed robots in 2019 that worked autonomously in office buildings doing chores like cleaning cafe tables and sorting trash. We quickly learned how “messy” and difficult the real world is for a robot. This experience informed the architecture and deployment of our AI systems while also gathering real-world data that could be combined with simulation data for training and improving models.
This focus on creating a product to meet specific customer needs and deploying robots in real-world settings is the only way to inform the structure of the AI tools and infrastructure for near-term utility on a path towards long-term broader capability and generality. There will be no “aha” moment, no silver bullet algorithm, and no volume of data sufficient to produce a general-purpose robot without extensive real-world experience.