2026-02-27 23:00:00
We need to give models knowledge that anchors their behavior to the realities of our world.
Modern AI chatbots can do amazing things, from writing research papers to composing Shakespearian sonnets about your cat. But amid the sparks of genius, there are flashes of idiocy. Time and again, the large language models, or LLMs, behind today’s generative AI tools make basic errors—from failing to solve basic high school math problems to stumbling over the rules of Connect Four.
This instability has been called “jagged intelligence” in tech circles, and it isn’t just a quirk—it’s a critical failing and part of the reason many experts believe we’re in an AI bubble. You wouldn’t hire a doctor or lawyer who, despite giving sound medical or legal advice, sometimes acts like they are clueless about how the world works. Enterprises seem to feel the same way about putting “jagged” AI in charge of supply chains, HR processes, or financial operations.
To solve the jagged intelligence problem, we must give our AI models access to a more powerful, more structured, and ultimately far more human stock of knowledge. Having engineered a range of AI systems over 30 years, I have found such knowledge to be an indispensable component of any reliable system.
This is because the technological innovations that launched the AI era aren’t capable of smoothing out these jagged edges. Current AI models don’t possess clear rules about how the world works; instead, they infer things from vast pools of data. In other words, they don’t know things, so they’re forced to guess—and when they guess wrong, the results range from the comical to the catastrophic.
Think about how humans learn. Born into “blooming, buzzing confusion,” babies spot patterns in the world around them: Faces are fun to look at, mom smells great, the cat scratches if you yank its tail. But pattern recognition is soon supplemented by clearly articulated knowledge: rules we’re taught, rather than things we absorb. From ABCs to arithmetic to how to load a dishwasher or drive a car, we use codified knowledge to learn efficiently—and avoid idiotic or dangerous mistakes along the way.
Current AI models don’t possess clear rules about how the world works; instead, they infer things from vast pools of data.
Frontier AI labs are already dabbling in this approach. Early LLMs struggled with grade-school math, so researchers bolted on actual mathematical knowledge—not hazy inferences, but explicit rules about how math works. The result: Google’s latest models can now reliably solve math Olympiad problems.
Adding more data of different types—for example video data, being advocated by AI luminaries such as Yann LeCun—won’t overcome the fundamental challenge of jagged intelligence. Even with extra data, it’s mathematically certain that the models will keep making mistakes—because that’s how probabilistic, data-driven AI works. Instead, we need to give models knowledge—rigidly described concepts and constraints, rules and relationships—that anchor their behavior to the realities of our world.
To give AI models a human stock of knowledge, we need to rapidly build a public database of formal knowledge spanning a range of disciplines. Of course, the rules of math are clear; the workings of other fields—health care, law, economics, or education, say—are, in some ways, vastly more complex. This challenge is now within our reach, as the growth of companies such as Scale AI, which provides high-quality data for training AI models, points to the emergence of a new profession—one that translates human expertise into machine-readable form and, in doing so, shapes not just what AI can do, but what it comes to treat as true.
This knowledge base could be accessed on demand by developers (or even AI agents) to provide verifiable insights covering everything from loading a dishwasher to the intricacies of the tax code. AI models would make fewer absurd mistakes, because they wouldn’t need to deduce everything from first principles. (Some research also suggests that such models would require far less data and energy, though these claims have yet to be proven.)
Unlike today’s opaque AI models, whose knowledge emerges from pattern recognition and is spread across billions of parameters, a formally distilled body of human knowledge could be directly examined, understood, and controlled. Regulators could verify a model’s knowledge, and users could ensure that tools were mathematically guaranteed not to make idiotic mistakes.
We need to give models knowledge—rigidly described concepts and constraints, rules and relationships—that anchors their behavior to the realities of our world.
The ambition to create such a knowledge resource is nothing new in AI. Even though previous efforts produced inconclusive results, it’s time to make a fresh start. Much as biologists use algorithms to speedrun the once-laborious process of modeling proteins, AI researchers could leverage generative AI to aid knowledge modeling.
It’s clear that current AI models are getting smarter and will get better by using different data. And yet, to overcome the challenge of jagged intelligence—and turn AI models into trusted partners and true drivers of value—we need to redefine the way models relate to and learn about the world. Data-driven algorithms allowed us to start talking to machines. But knowledge, not data, is the key to sustaining the future of AI past the potential bubble.
This article was originally published on Undark. Read the original article.

The post Sparks of Genius to Flashes of Idiocy: How to Solve AI’s ‘Jagged Intelligence’ Problem appeared first on SingularityHub.
2026-02-27 07:28:40
Fossil fuels still dominate the energy mix. But growth in renewables offset nearly 75% of new power demand.
Booming energy demand is driving a scramble to set up new generating capacity, and one technology is proving to be the clear winner. Newly released federal data shows that solar power grew by more than 35 percent year-over-year in 2025, outpacing all other forms of generation.
After decades of relatively flat electricity use, US power demand is rising again. A new report from the US Energy Information Administration shows that consumption jumped 2.8 percent in 2025, thanks to rising industrial activity and the rapid expansion of energy-hungry AI data centers.
While increased fossil-fuel generation met much of that additional demand—a revival in coal, in particular—solar power posted the fastest growth of any major source. According to an analysis by Ars Technica, the Energy Information Administration data shows US solar generation increased by more than 35 percent year-over-year, driven by 27 gigawatts of newly installed capacity.
The surge pushed total solar output above hydroelectric power for the first time in terms of total annual generation. Hydropower output itself was relatively stable compared with the prior year, while solar continued its recent breakneck expansion, with capacity increasing rapidly across multiple regions.
Solar’s surge, which added about 85 terawatt-hours of generation, met about two-thirds of the increased energy demand. This number rose to 73 percent when combined with wind power, which grew by a more modest 2.8 percent.
But the data’s not-so-silver lining is that the remaining demand was met primarily by 13 percent growth in coal power. That bucks the recent trend of coal’s diminishing importance in the US power supply and was driven by a complex confluence of changes in the US energy system.
In previous years, natural gas has been the go-to fossil fuel due to abundant domestic supply and the ability to rapidly ramp power up and down. However, the Trump administration’s tariff policies have made it more difficult and expensive to source the equipment for gas power plants, and rapid expansion in gas exports means domestic utilities are competing with foreign buyers for fuel.
Altogether, this made gas generation a less reliable bet, and generation actually shrank 3.3 percent last year. That made coal a more attractive option, and its position in the US energy mix grew.
Still, the outlook remains bright for renewables, and solar in particular. A recent Energy Information Administration analysis showed roughly 43 gigawatts of utility-scale solar capacity is planned or under construction for 2026, potentially making it an even bigger year for solar than 2025. More than half of this new capacity is in just four states: Texas, Arizona, California, and Michigan.
And wind power’s growth could more than double this year with planned additions of 11.8 gigawatts, 60 percent of which are in New Mexico, Texas, Illinois, and Wyoming.
Growth in renewables will also be supported by another record-breaking year for added battery storage. Last year, the industry added a record 15 gigawatts of capacity to the grid. Planned projects would grow capacity by another 24 gigawatts in 2026, 80 percent of which will be in Texas, California, and Arizona.
It’ll be a while yet before green energy overtakes fossil fuels, which still accounted for 58 percent of total generation in 2025. But the data suggests the energy transition is well underway. If solar and wind continue to meet growing energy demand, the US energy system could soon look very different.
The post US Solar Surged 35% in 2025, Overtaking Hydro for the First Time appeared first on SingularityHub.
2026-02-25 05:50:18
Scientists are co-opting seismic sensors to detect space debris streaking through the atmosphere at hypersonic speeds.
In the early morning of April 2, 2024, the sky over southern California lit up with flashes of blazing light. Residents were bewildered. Were they missiles? A crashing plane? The unusual activity confused even experts—until they realized it was a disposable part of China’s Shenzhou-15 spacecraft burning up in the atmosphere as it returned to Earth.
Scientists knew the event was on the horizon and had mapped out a potential entry point over the northern Atlantic Ocean, thousands of miles from metropolitan Los Angeles. Luckily, no one was hurt as the module broke apart over the city.
But the incident underlined an uncomfortable truth. We’re nowhere near being able to accurately predict the path of space debris as it rains down. As more spacecraft are launched and reenter the atmosphere, damage to infrastructure and Earthlings is only a matter of time.
Researchers are looking into a solution from an unexpected source: sensors that measure earthquakes. As space debris plummets to the ground at hypersonic speeds, it generates a sonic boom. This causes a slight tremor in the ground that the sensors readily register.
Using data from a network of these sensors, Benjamin Fernando at Johns Hopkins University and Constantinos Charalambous at Imperial College London developed a system that can reconstruct the path of space debris with unprecedented accuracy. They used the system to map Shenzhou-15’s speed, altitude, gradual disintegration, and final destination.
To be clear, this isn’t an early warning system. Because sonic booms lag behind the objects causing them, the method is like a forensic reconstruction of space debris’ final journey. Still, it can quickly identify potential fall-out zones for faster retrieval and cleanup, which is especially important if the junk is toxic or radioactive.
The work is “a crucial step toward near-real-time monitoring of natural and anthropogenic objects entering from space,” wrote Chris Carr at the Los Alamos National Laboratory, who was not involved in the work.
Launching satellites was once a colossal undertaking. But thanks to innovations by SpaceX and national space agencies across the world, it’s becoming far more routine.
These spacecraft have already changed life on Earth. Thousands of Starlink satellites beam the internet to previous dead zones and disaster areas. Miniature satellites are now an affordable research platform scientists use to profile weather, measure solar winds, and track the effects of microgravity and radiation on living cells. And a new space race will only grow the fleets of spacecraft already blanketing the Earth.
“The big change that we’ve seen since 2020 is the rise of satellite mega-constellations…companies not putting up a dozen spacecraft, but maybe a thousand or ten thousand over the course of a few years,” Fernando told Science.
Mega-constellations have already caused problems for scientists by polluting astronomical images with bright streaks. They may also increase the rate at which space debris rains down. In a paper describing their system, Fernando and Charalambous write that in 2025 there were roughly four to five re-entries a day, and the numbers are likely to rapidly grow.
We already monitor spacecraft in orbit. Telescopes bring real-time visuals. Radar tracks location and speed. But these tools struggle as a spacecraft drifts into the Earth’s upper atmosphere.
The interaction between fragments and air becomes “really chaotic,” said Fernando. “We can no longer predict with particularly good accuracy exactly where [and when] a piece of re-entering space debris is going to enter the atmosphere.”
Radar can track spacecraft parts as they return to Earth, but the technology is limited to small regions of the world and barely covers the oceans. Even when we know the final fate of a piece of debris, it’s often difficult to reconstruct its full trajectory.
The new work was inspired by the way scientists track meteoroids using a dense network of earthquake sensors to detect tiny vibrations in the ground.
The Shenzhou-15 capsule entered the atmosphere going roughly 25 to 30 times the speed of sound. Like a fighter jet, it triggered a powerful sonic boom roughly 80 kilometers (50 miles) above the ground. The boom traveled to Earth’s surface where seismic sensors detected it.
It’s like picking up an earthquake, only “in this case the waves are coming from up versus with earthquakes they tend to come from down,” said Fernando.
Southern California is heavily dotted with seismic sensors, each measuring activity in a small area. To model the spacecraft’s path and speed, the team compiled the largest sonic boom each sensor registered and its arrival time and compiled the data into a map.
The map captured where, when, and how the capsule broke down as it hurtled through the atmosphere. Earlier on, the sensors recorded large, discrete signals. These later became more scattered and complex, suggesting the capsule gradually disintegrated rather than blowing up all at once.
The results are “consistent with on-ground observations, including videos and witness reports of multiple fireballs flying across the sky,” wrote Carr. After more deeply combing through the data, the team showed it could also be used to measure the size of each piece of decaying debris.
The spacecraft’s sonic signature differed from those generated by meteorites, making it possible to tease apart human-made objects and those of natural origins.
Differentiating the two categories is key. Meteorites pose “kinetic risk” as chunks slam into the ground, damaging cars, houses, and other infrastructure. Human space debris, however, could also contain metals, toxic or flammable material, or in rare cases, radioactive components. The model also reconstructed how different parts of the spacecraft disintegrated, potentially making it easier to predict whether chunks have burned up completely in the atmosphere or have reached the ground, making it useful for recovery or clean-up missions.
Crash-and-burn isn’t a spacecraft’s only destiny. Engineers are also working to move defunct satellites into higher orbits that would be stable for “thousands of years” according to Fernando, though this doesn’t solve the space junk problem. Other researchers are exploring ways to design spacecraft such that they completely burn up both safely and predictably.
For now, the technology works best in places with lots of seismic sensors, which are rare. But there’s a push to add sensors in places that are vulnerable due to sensitive ecology or geology at prices far lower than building radar systems to track re-entry, said Fernando.
The post More Space Junk Is Plummeting to Earth. Earthquake Sensors Can Track It by the Sonic Booms. appeared first on SingularityHub.
2026-02-24 06:35:51
A new tool takes relatively few resources to chart algorithms’ inner workings and steer their behavior.
The inner workings of large AI systems remain largely opaque, raising significant safety and trust issues. Researchers have now developed a technique to extract and manipulate the internal concepts governing model behavior, providing a new way to understand and steer their activity.
Modern AI models are marvels of engineering, but even their creators remain in the dark about how they represent knowledge internally. This is why subtle shifts in prompting can produce surprisingly different outputs. Simply asking a model to show its work before answering often improves accuracy, while certain deliberately malicious prompts can override built-in safety features.
This has motivated significant research aimed at teasing out the patterns of activity in these models’ neural networks that correspond to specific concepts. Investigators hope to use these methods to better understand why models behave certain ways and potentially modify their behavior on the fly.
Now researchers have unveiled an efficient new way of extracting concepts from models that works across language, reasoning, and vision algorithms. In a paper in Science, the researchers used these concepts to both monitor and effectively steer model behavior.
“Our results illustrate the power of internal representations for advancing AI safety and model capabilities,” the authors write. “We showed how these representations enabled model steering, through which we exposed vulnerabilities and improved model capabilities.”
Key to the team’s approach is a new algorithm called the Recursive Feature Machine (RFM). They trained the algorithm on pairs of prompts—some containing a concept of interest, others not—and then identified patterns of activity in the model’s neural network tracking each concept.
This allows the algorithm to learn “concept vectors”—essentially patterns of activity that nudge the model in the direction of a specific concept. The vectors can be used to modify the model’s internal processes when it’s generating an output to steer it toward or away from specific concepts or behaviors.
To test the approach, the researchers asked GPT-4o to produce 512 concepts across five concept classes and generate training data on each. They extracted concept vectors from the data and used the vectors to steer the behavior of several large AI models.
The approach worked well across a broad range of model types, including large language models, vision-language models, and reasoning models. Surprisingly, they found newer, larger, and better-performing models were actually more steerable than some smaller ones.
Crucially, the team showed they could use the technique to expose and address serious vulnerabilities in the models. In one test, they created a vector for the concept of “anti-refusal,” which allowed them to bypass built-in safety features in vision-language models to prevent them from giving advice on how take drugs. But they also learned a vector for “anti-deception,” which they successfully used to steer a model away from giving misleading answers.
One of the study’s more interesting findings was that the extracted features were transferable across languages. A concept vector learned with English training data could be used to alter outputs in other languages. The researchers also found they could combine multiple concept vectors to manipulate model behavior in more sophisticated ways.
But the new technique’s real power is in its efficiency. It took fewer than 500 training samples and less than a minute of processing time on a single Nvidia A100 GPU to identify activity patterns associated with a concept and steer towards it.
The researchers say this could not only make it possible to systematically map concepts within large AI models, but it could also lead to more efficient ways of tweaking model behavior after training compared to existing methods.
The approach is still a long way from delivering complete model transparency. But it’s a useful addition in the growing arsenal of model analysis tools that will become increasingly important as AI pushes deeper into all of our lives.
The post Researchers Break Open AI’s Black Box—and Use What They Find Inside to Control It appeared first on SingularityHub.
2026-02-21 23:00:00
Microsoft’s Glass Chip Holds Terabytes of Data for 10,000 YearsGayoung Lee | Gizmodo
“Our knowledge of the past comes from stone tablets and old parchment. But thousands of years from now, our descendants may learn of our lives from a thin slice of glass carrying an impressive load of data—all thanks to physics that sounds borderline magical.”
AI Isn’t Coming for Every White-Collar Job. At Least Not Yet.Cade Metz | The New York Times ($)
“Most experts believe that code generators will replace today’s junior programmers. Using these tools, they say, feels like delegating tasks to someone who is still learning the trade. But these experts are divided on whether these tools will significantly harm the overall market for coders. Some, including Mr. Metzger, argue that code generators will expand the job market as programmers and software companies use them to build increasingly complex and powerful applications.”
Watch Unitree’s G1 Unleash a Kung FU Robot FrenzyTrevor Mogg | Digital Trends
“Performing alongside kids from the Tagou Martial Arts School for the Spring Festival Gala on China Central TV, the robots displayed incredible agility and coordination, moving at around 3 meters per second while performing flips, table vaults, somersaults, and rapid formation changes, blending martial arts with robotics innovation.”
Bacteria Frozen Inside 5,000-Year-Old Ice Cave Is Crazy Resistant to AntibioticsEllyn Lapointe | Gizmodo
“For decades, antibiotics have been humanity’s frontline defense against bacterial infections, yet these essential medications have also led to the rise of drug-resistant ‘superbugs.’ Now, researchers have discovered an ancient strain of bacteria that managed to develop this superpower thousands of years before humans ever invented antibiotics.”
Could AI Data Centers Be Moved to Outer Space?Rhett Allain | Wired ($)
“Just think: You could get 24/7 energy from solar panels—it’s always sunny in space—and the thermal stuff wouldn’t be an issue because it’s so cold out there. …That’s the claim, anyway. Could this really work? Or is it about as practical as colonizing Mars? I asked Google’s AI Overview, and it said, ‘Yes, data centers can be built in space.’ But of course it would say that. I think we’ll have to go full renegade and dial up some old-fashioned human intelligence on this.”
Race for AI Is Making Hindenburg-Style Disaster ‘a Real Risk’, Says Leading ExpertIan Sample | The Guardian
“The Hindenburg, a 245-meter airship that made round trips across the Atlantic, was preparing to land in New Jersey in 1937 when it burst into flames, killing 36 crew, passengers, and ground staff. …’The Hindenburg disaster destroyed global interest in airships; it was a dead technology from that point on, and a similar moment is a real risk for AI,’ Wooldridge said. Because AI is embedded in so many systems, a major incident could strike almost any sector.”
Sub-$200 Lidar Could Reshuffle Auto Sensor EconomicsWillie D. Jones | IEEE Spectrum
“That’s less than half of typical prices now, and it’s not even the full extent of the company’s ambition. The company says its longer-term goal is $100 per unit. MicroVision’s claim, which, if realized, would place lidar within reach of advanced driver-assistance systems (ADAS) rather than limiting it to high-end autonomous vehicle programs.”
Meta’s VR Metaverse Is Ditching VRJay Peters | The Verge
“Instead of attempting to make the 3D social platform work for both VR and mobile, Meta is ‘explicitly separating’ its ‘Quest VR platform from our Worlds platform’ and ‘shifting the focus of Worlds to be almost exclusively mobile,’ Samantha Ryan, Reality Labs’ VP of content, says in a blog post. The new approach sets Meta up to better compete with platforms like Roblox and Fortnite, which also offer user-generated experiences that can be played on your phone.”
Uber Is Sinking Over $100 Million Into Charging Stations for Self-Driving CarsBruce Gil | Gizmodo
“While the emergence of driverless taxi services might sound like a threat to Uber’s business, the company is making new investments aimed at ensuring its success. …The company says it will begin rolling out these [robotaxi charging] hubs in the US in the San Francisco Bay Area, Los Angeles, and Dallas, with more cities to come in the future.”
Amazon’s Cloud ‘Hit by Two Outages Caused by AI Tools Last Year’Aisha Down | The Guardian
“Amazon’s huge cloud computing arm reportedly experienced at least two outages caused by its own artificial intelligence tools, raising questions about the company’s embrace of AI as it lays off human employees. A 13-hour interruption to Amazon Web Services’ (AWS) operations in December was caused by an AI agent autonomously choosing to ‘delete and then recreate’ a part of its environment, the Financial Times reported.”
Microbe With the Smallest Genome Yet Pushes the Boundaries of LifeJake Buehler | New Scientist ($)
“The findings further muddy the distinction between cellular organelles like mitochondria and the most barebones microbes in nature. ‘Exactly where this highly integrated symbiont ends and an organelle starts, I think it’s very difficult to say,’ says Piotr Łukasik at Jagiellonian University in Kraków, Poland. ‘This is a very blurred boundary.'”
The post This Week’s Awesome Tech Stories From Around the Web (Through February 21) appeared first on SingularityHub.
2026-02-20 23:00:00
Tech companies have touted scientific findings from AI systems. But can they truly produce bona fide advancements?
Ahead of an artificial intelligence conference held last April, peer reviewers considered papers written by “Carl” alongside other submissions. What the reviewers did not know was that, unlike other authors, Carl wasn’t a scientific researcher, but rather an AI system built by the tech company Autoscience Institute, which says that the model can accelerate artificial intelligence research. And at least according to the humans involved in the review process, the papers were good enough for the conference: In the double-blind peer review process, three of the four papers, which were authored by Carl (with varying levels of human input) were accepted.
Carl joins a growing group of so-called “AI scientists,” which include Robin and Kosmos, research agents developed by the San Francisco-based nonprofit research lab FutureHouse, and The AI Scientist, introduced by the Japanese company Sakana AI, among others. AI scientists are made up from multiple large language models. For example, Carl differs from chatbots in that it’s devised to generate and test ideas and produce findings, said Eliot Cowan, co-founder of Autoscience Institute. Companies say these AI-driven systems can review literature, devise hypotheses, conduct experiments, analyze data, and produce novel scientific findings with varying degrees of autonomy.
The goal, said Cowan, is to develop AI systems that can increase efficiency and scale up the production of science. And other companies like Sakana AI have indicated a belief that AI scientists are unlikely to replace human ones.
Still, the automation of science has stirred a mix of concern and optimism among the AI and scientific communities. “You start feeling a little bit uneasy, because, hey, this is what I do,” said Julian Togelius, a professor of computer science at New York University who works on artificial intelligence. “I generate hypotheses, read the literature.”
AI scientists are made up from multiple large language models. Carl differs from chatbots in that it’s devised to generate and test ideas and produce findings.
Critics of these systems, including scientists who themselves study artificial intelligence, worry that AI scientists could displace researchers of the next generation, flood the system with low quality or untrustworthy data, and erode trust in scientific findings. The advancements also pose a question about where AI fits into the inherently social and human scientific enterprise, said David Leslie, director of ethics and responsible innovation research at The Alan Turing Institute in London. “There’s a difference between the full-blown shared practice of science and what’s happening with a computational system.”
In the last five years, automated systems have already led to important scientific advances. For example, AlphaFold, an AI system developed by Google DeepMind, was able to predict the three-dimensional structures of proteins with high resolution more quickly than scientists in the lab. The developers of AlphaFold, Demis Hassabis and John Jumper, won a 2024 Nobel Prize in Chemistry for their protein prediction work.
Now companies have expanded to integrate AI into other aspects of the scientific discovery, creating what Leslie calls computational Frankensteins. The term, he says, refers to the convergence of various generative AI infrastructure, algorithms, and other components used “to produce applications that attempt to simulate or approximate complex and embodied social practices (like practices of scientific discovery).” In 2025 alone, at least three companies and research labs—Sakana AI, Autoscience Institute, and FutureHouse (which launched a commercial spinoff called Edison Scientific in November)—have touted their first “AI-generated” scientific results. Some US government scientists have also embraced artificial intelligence: Researchers at three federal labs, Argonne National Laboratory, the Oak Ridge National Laboratory, and Lawrence Berkeley National Laboratory, have developed AI-driven, fully automated materials laboratories.
“You start feeling a little bit uneasy, because, hey, this is what I do.”
Indeed, these AI systems, like large language models, could be potentially used to synthesize literature and mine vast amounts of data to identify patterns. Particularly, they may be useful in material sciences, in which AI systems can design or discover new materials, and in understanding the physics of subatomic particles.
Systems can “basically make connections between millions, billions, trillions of variables” in ways that humans can’t, said Leslie. “We don’t function that way, and so just in virtue of that capacity, there are many, many opportunities.” For example, FutureHouse’s Robin mined literature and identified a potential therapeutic candidate for a condition that causes vision loss, proposed experiments to test the drug, and then analyzed the data.
But researchers have also raised red flags. While Nihar Shah, a computer scientist at Carnegie Mellon University, is “more on the optimistic side” about how AI systems can enable new discoveries, he also worries about AI slop, or the overflow of the scientific literature with AI-generated studies of poor quality and little innovation. Researchers have also pointed out other important caveats regarding the peer review process.
In a recent study that is yet to be peer reviewed, Shah and colleagues tested two AI models that aid in the scientific process: Sakana’s AI Scientist-v2 (an updated version of the original) and Agent Laboratory, a system developed by AMD, a semiconductor company, in collaboration with Johns Hopkins University, to perform research assistant tasks. Shah’s goal with the study was to examine where these systems might be failing.
One AI system, the AI Scientist-v2, reported 95 and sometimes even 100 percent accuracy on a specified task, which was impossible given that the researchers had intentionally introduced noise into the dataset. Seemingly, both systems were sometimes making up synthetic datasets to run the analysis on while stating in the final report that it was done on the original dataset. To address this, Shah and his team developed an algorithm to flag methodological pitfalls they identified, such as cherry-picking favorable datasets to run their analysis and selective reporting of positive results.
Some research suggests generative AI systems have also failed to produce innovative ideas. One study concluded that one generative AI chatbot, ChatGPT4, can only produce incremental discoveries, while a recent study published last year in Science Immunology found that, despite being able to synthesize the literature accurately, AI chatbots failed to generate insightful hypotheses or experimental proposals in the field of vaccinology. (Sakana AI and FutureHouse did not respond to requests for comments.)
Even if these systems continue being used, a human place in the lab will likely not disappear, Shah said. “Even if AI scientists become super-duper duper capable, still there’ll be a role for people, but that itself is not entirely clear,” said Shah, “as to how capable will AI scientists be and how much would still be there for humans?”
Historically, science has been a deeply human enterprise, which Leslie described as an ongoing process of interpretation, world-making, negotiation, and discovery. Importantly, he added, that process is dependent on the researchers themselves and the values and biases they hold.
A computational system trained to predict the best answer, in contrast, is categorically distinct, Leslie said. “The predictive model itself is just getting a small slice of a very complex and deep, ongoing practice, which has got layers of institutional complexity, layers of methodological complexity, historical complexity, layers of discrimination that have arisen from other injustices that define who gets to do science, who doesn’t get to do science, and what science has done for whom, and what science has not done because people aren’t sending to have their questions answered.”
Researchers at three federal labs have developed AI-driven, fully automated materials laboratories.
Rather than as a substitute for scientists, some experts see AI scientists as an additional, augmentative tool for researchers to help draw out insights, much like a microscope or a telescope. Companies also say they do not intend to replace scientists. “We do not believe that the role of a human scientist will be diminished. If anything, the role of a scientist will change and adapt to new technology, and move up the food chain,” Sakana AI wrote when the company announced its AI Scientist.
Now researchers are beginning to ponder what the future of science might look like alongside AI systems, including how to vet and validate their output. “We need to be very reflective about how we classify what’s actually happening in these tools, and if they’re harming the rigor of science as opposed to enriching our interpretive capacity by functioning as a tool for us to use in rigorous scientific practice,” said Leslie.
Going forward, Shah proposed, journals and conferences should vet AI research output by auditing log traces of the research process and generated code to both validate the findings and identify any methodological flaws. And companies, such as Autoscience Institute, say they are building systems to make sure that experiments hold to the same ethical standards as “an experiment run by a human at an academic institution would have to meet,” said Cowan. Some of the standards baked into Carl, Cowan noted, include preventing false attribution and plagiarism, facilitating reproducibility, and not using human subjects or sensitive data, among others.
While some researchers and companies are focused on improving the AI models, others are stepping back to ask how the automation of science will affect the people currently doing the research. Now is a good time to begin to grapple with such questions, said Togelius. “We got the message that AI tools that make that make us better at doing science, that’s great. Automating ourselves out of the process is terrible,” he added “How do we do one and not the other?”
This article was originally published on Undark. Read the original article.

The post What the Rise of AI Scientists May Mean for Human Research appeared first on SingularityHub.