2026-04-29 02:03:40

When it comes to AI models, size matters.
Even though some artificial-intelligence experts warn that scaling up large language models (LLMs) is hitting diminishing performance returns, companies are still coming out with ever larger AI tools. Meta’s latest Llama release had a staggering 2 trillion parameters that define the model.
As models grow in size, their capabilities increase. But so do the energy demands and the time it takes to run the models, which increases their carbon footprint. To mitigate these issues, people have turned to smaller, less capable models and using lower-precision numbers whenever possible for the model parameters.
But there is another path that may retain a staggeringly large model’s high performance while reducing the time it takes to run an energy footprint. This approach involves befriending the zeros inside large AI models.
For many models, most of the parameters—the weights and activations—are actually zero, or so close to zero that they could be treated as such without losing accuracy. This quality is known as sparsity. Sparsity offers a significant opportunity for computational savings: Instead of wasting time and energy adding or multiplying zeros, these calculations could simply be skipped; rather than storing lots of zeros in memory, one need only store the nonzero parameters.
Unfortunately, today’s popular hardware, like multicore CPUs and GPUs, do not naturally take full advantage of sparsity. To fully leverage sparsity, researchers and engineers need to rethink and re-architect each piece of the design stack, including the hardware, low-level firmware, and application software.
In our research group at Stanford University, we have developed the first (to our knowledge) piece of hardware that’s capable of calculating all kinds of sparse and traditional workloads efficiently. The energy savings varied widely over the workloads, but on average our chip consumed one-seventieth the energy of a CPU, and performed the computation on average eight times as fast. To do this, we had to engineer the hardware, low-level firmware, and software from the ground up to take advantage of sparsity. We hope this is just the beginning of hardware and model development that will allow for more energy-efficient AI.
Neural networks, and the data that feeds into them, are represented as arrays of numbers. These arrays can be one-dimensional (vectors), two-dimensional (matrices), or more (tensors). A sparse vector, matrix, or tensor has mostly zero elements. The level of sparsity varies, but when zeroes make up more than 50 percent of any type of array, it can stand to benefit from sparsity-specific computational methods. In contrast, an object that is not sparse—that is, it has few zeros compared with the total number of elements—is called dense.
Sparsity can be naturally present, or it can be induced. For example, a social-network graph will be naturally sparse. Imagine a graph where each node (point) represents a person, and each edge (a line segment connecting the points) represents a friendship. Since most people are not friends with one another, a matrix representing all possible edges will be mostly zeros. Other popular applications of AI, such as other forms of graph learning and recommendation models, contain naturally occurring sparsity as well.
Beyond naturally occurring sparsity, sparsity can also be induced within an AI model in several ways. Two years ago, a team at Cerebras showed that one can set up to 70 to 80 percent of parameters in an LLM to zero without losing any accuracy. Cerebras demonstrated these results specifically on Meta’s open-source Llama 7B model, but the ideas extend to other LLM models like ChatGPT and Claude.
Sparse computation’s efficiency stems from two fundamental properties: the ability to compress away zeros and the convenient mathematical properties of zeros. Both the algorithms used in sparse computation and the hardware dedicated to them leverage these two basic ideas.
First, sparse data can be compressed, making it more memory efficient to store “sparsely”—that is, in something called a sparse data type. Compression also makes it more energy efficient to move data when dealing with large amounts of it. This is best understood by an example. Take a four-by-four matrix with three nonzero elements. Traditionally, this matrix would be stored in memory as is, taking up 16 spaces. This matrix can also be compressed into a sparse data type, getting rid of the zeros and saving only the nonzero elements. In our example, this results in 13 memory spaces as opposed to 16 for the dense, uncompressed version. These savings in memory increase with increased sparsity and matrix size.

In addition to the actual data values, compressed data also requires metadata. The row and column locations of the nonzero elements also must be stored. This is usually thought of as a “fibertree”: The row labels containing nonzero elements are listed and linked to the column labels of the nonzero elements, which are then linked to the values stored in those elements.
In memory, things get a bit more complicated still: The row and column labels for each nonzero value must be stored as well as the “segments” that indicate how many such labels to expect, so the metadata and data can be clearly delineated from one another.
In a dense, noncompressed matrix data type, values can be accessed either one at a time or in parallel, and their locations can be calculated directly with a simple equation. However, accessing values in sparse, compressed data requires looking up the coordinates of the row index and using that information to “indirectly” look up the coordinates of the column index before finally reaching the value. Depending on the actual locations of the sparse data values, these indirect lookups can be extremely random, making the computation data-dependent and requiring the allocation of memory lookups on the fly.
Second, two mathematical properties of zero let software and hardware skip a lot of computation. Multiplying any number by zero will result in a zero, so there’s no need to actually do the multiplication. Adding zero to any number will always return that number, so there’s no need to do the addition either.
In matrix-vector multiplication, one of the most common operations in AI workloads, all computations except those involving two nonzero elements can simply be skipped. Take, for example, the four-by-four matrix from the previous example and a vector of four numbers. In dense computation, each element of the vector must be multiplied by the corresponding element in each row and then added together to compute the final vector. In this case, that would take 16 multiplication operations and 16 additions (or four accumulations).
In sparse computation, only the nonzero elements of the vector need be considered. For each nonzero vector element, indirect lookup can be used to find any corresponding nonzero matrix element, and only those need to be multiplied and added. In the example shown here, only two multiplication steps will be performed, instead of 16.
Unfortunately, modern hardware is not well suited to accelerating sparse computation. For example, say we want to perform a matrix-vector multiplication. In the simplest case, in a single CPU core, each element in the vector would be multiplied sequentially and then written to memory. This is slow, because we can do only one multiplication at a time. So instead people use CPUs with vector support or GPUs. With this hardware, all elements would be multiplied in parallel, greatly speeding up the application. Now, imagine that both the matrix and vector contain extremely sparse data. The vectorized CPU and GPU would spend most of their efforts multiplying by zero, performing completely ineffectual computations.
Newer generations of GPUs are capable of taking some advantage of sparsity in their hardware, but only a particular kind, called structured sparsity. Structured sparsity assumes that two out of every four adjacent parameters are zero. However, some models benefit more from unstructured sparsity—the ability for any parameter (weight or activation) to be zero and compressed away, regardless of where it is and what it is adjacent to. GPUs can run unstructured sparse computation in software, for example, through the use of the cuSparse GPU library. However, the support for sparse computations is often limited, and the GPU hardware gets underutilized, wasting energy-intensive computations on overhead.
When doing sparse computations in software, modern CPUs may be a better alternative to GPU computation, because they are designed to be more flexible. Yet, sparse computations on the CPU are often bottlenecked by the indirect lookups used to find nonzero data. CPUs are designed to “prefetch” data based on what they expect they’ll need from memory, but for randomly sparse data, that process often fails to pull in the right stuff from memory. When that happens, the CPU must waste cycles calling for the right data.
Apple was the first to speed up these indirect lookups by supporting a method called an array-of-pointers access pattern in the prefetcher of their A14 and M1 chips. Although innovations in prefetching make Apple CPUs more competitive for sparse computation, CPU architectures still have fundamental overheads that a dedicated sparse computing architecture would not, because they need to handle general-purpose computation.
Other companies have been developing hardware that accelerates sparse machine learning as well. These include Cerebras’s Wafer Scale Engine and Meta’s Training and Inference Accelerator (MTIA). The Wafer Scale Engine, and its corresponding sparse programming framework, have shown incredibly sparse results of up to 70 percent sparsity on LLMs. However, the company’s hardware and software solutions support only weight sparsity, not activation sparsity, which is important for many applications. The second version of the MTIA claims a sevenfold sparse compute performance boost over the MTIA v1. However, the only publicly available information regarding sparsity support in the MTIA v2 is for matrix multiplication, not for vectors or tensors.
Although matrix multiplications take up the majority of computation time in most modern ML models, it’s important to have sparsity support for other parts of the process. To avoid switching back and forth between sparse and dense data types, all of the operations should be sparse.
Instead of these halfway solutions, our team at Stanford has developed a hardware accelerator, Onyx, that can take advantage of sparsity from the ground up, whether it’s structured or unstructured. Onyx is the first programmable accelerator to support both sparse and dense computation; it’s capable of accelerating key operations in both domains.
To understand Onyx, it is useful to know what a coarse-grained reconfigurable array (CGRA) is and how it compares with more familiar hardware, like CPUs and field-programmable gate arrays (FPGAs).CPUs, CGRAs, and FPGAs represent a trade-off between efficiency and flexibility. Each individual logic unit of a CPU is designed for a specific function that it performs efficiently. On the other hand, since each individual bit of an FPGA is configurable, these arrays are extremely flexible, but very inefficient. The goal of CGRAs is to achieve the flexibility of FPGAs with the efficiency of CPUs.
CGRAs are composed of efficient and configurable units, typically memory and compute, that are specialized for a particular application domain. This is the key benefit of this type of array: Programmers can reconfigure the internals of a CGRA at a high level, making it more efficient than an FPGA but more flexible than a CPU.
The Onyx chip, built on a coarse-grained reconfigurable array (CGRA), is the first (to our knowledge) to support both sparse and dense computations. Olivia Hsu
Onyx is composed of flexible, programmable processing element (PE) tiles and memory (MEM) tiles. The memory tiles store compressed matrices and other data formats. The processing element tiles operate on compressed matrices, eliminating all unnecessary and ineffectual computation.
The Onyx compiler handles conversion from software instructions to CGRA configuration. First, the input expression—for instance, a sparse vector multiplication—is translated into a graph of abstract memory and compute nodes. In this example, there are memories for the input vectors and output vectors, a compute node for finding the intersection between nonzero elements, and a compute node for the multiplication. The compiler figures out how to map the abstract memory and compute nodes onto MEMs and PEs on the CGRA, and then how to route them together so that they can transfer data between them. Finally, the compiler produces the instruction set needed to configure the CGRA for the desired purpose.
Since Onyx is programmable, engineers can map many different operations, such as vector-vector element multiplication, or the key tasks in AI, like matrix-vector or matrix-matrix multiplication, onto the accelerator.
We evaluated the efficiency gains of our hardware by looking at the product of energy used and the time it took to compute, called the energy-delay product (EDP). This metric captures the trade-off of speed and energy. Minimizing just energy would lead to very slow devices, and minimizing speed would lead to high-area, high-power devices.
Onyx achieves up to 565 times as much energy-delay product over CPUs (we used a 12-core Intel Xeon CPU) that utilize dedicated sparse libraries. Onyx can also be configured to accelerate regular, dense applications, similar to the way a GPU or TPU would. If the computation is sparse, Onyx is configured to use sparse primitives, and if the computation is dense, Onyx is reconfigured to take advantage of parallelism, similar to how GPUs function. This architecture is a step toward a single system that can accelerate both sparse and dense computations on the same silicon.
Just as important, Onyx enables new algorithmic thinking. Sparse acceleration hardware will not only make AI more performance- and energy efficient but also enable researchers and engineers to explore new algorithms that have the potential to dramatically improve AI.
Our team is already working on next-generation chips built off of Onyx. Beyond matrix multiplication operations, machine learning models perform other types of math, like nonlinear layers, normalization, the softmax function, and more. We are adding support for the full range of computations on our next-gen accelerator and within the compiler. Since sparse machine learning models may have both sparse and dense layers, we are also working on integrating the dense and sparse accelerator architecture more efficiently on the chip, allowing for fast transformation between the different data types. We’re also looking at ways to manage memory constraints by breaking up the sparse data more effectively so we can run computations on several sparse accelerator chips.
We are also working on systems that can predict the performance of accelerators such as ours, which will help in designing better hardware for sparse AI. Longer term, we’re interested in seeing whether high degrees of sparsity throughout AI computation will catch on with more model types, and whether sparse accelerators become adopted at a larger scale.
Building the hardware to unstructured sparsity and optimally take advantage of zeros is just the beginning. With this hardware in hand, AI researchers and engineers will have the opportunity to explore new models and algorithms that leverage sparsity in novel and creative ways. We see this as a crucial research area for managing the ever-increasing runtime, costs, and environmental impact of AI.
2026-04-29 02:00:02

Many of the world’s most advanced electronic systems—including Internet routers, wireless base stations, medical imaging scanners, and some artificial intelligence tools—depend on field-programmable gate arrays. Computer chips with internal hardware circuits, the FPGAs can be reconfigured after manufacturing.
On 12 March, an IEEE Milestone plaque recognizing the first FPGA was dedicated at the Advanced Micro Devices campus in San Jose, Calif., the former Xilinx headquarters and the birthplace of the technology.
The FPGA earned the Milestone designation because it introduced iteration to semiconductor design. Engineers could redesign hardware repeatedly without fabricating a new chip, dramatically reducing development risk and enabling faster innovation at a time when semiconductor costs were rising rapidly.
The ceremony, which was organized by the IEEE Santa Clara Valley Section, brought together professionals from across the semiconductor industry and IEEE leadership. Speakers at the event included Stephen Trimberger, an IEEE and ACM Fellowwhose technical contributions helped shape modern FPGA architecture. Trimberger reflected on how the invention enabled software-programmable hardware.
FPGAs emerged in the 1980s to address a core limitation in computing. A microprocessor executes software instructions sequentially, making it flexible but sometimes too slow for workloads requiring many operations at once.
At the other extreme, application-specific integrated circuits are chips designed to do only one task. ASICs achieve high efficiency but require lengthy development cycles and nonrecurring engineering costs, which are large, upfront investments. Expenses include designing the chip and preparing it for manufacturing—a process that involves creating detailed layouts, building masks for the fabrication machines, and setting up production lines to handle the tiny circuits.
“ASICs can deliver the best performance, but the development cycle is long and the nonrecurring engineering cost can be very high,” says Jason Cong, an IEEE Fellow and professor of computer science at the University of California, Los Angeles. “FPGAs provide a sweet spot between processors and custom silicon.”
Cong’s foundational work in FPGA design automation and high-level synthesis transformed how reconfigurable systems are programmed. He developed synthesis tools that translate C/C++ into hardware designs, for example.
At the heart of his work is an underlying principle first espoused by electrical engineer Ross Freeman: By configuring hardware using programmable memory embedded inside the chip, FPGAs combine hardware-level speed with the adaptability traditionally associated with software.
The FPGA architecture originated in the mid-1980s at Xilinx, a Silicon Valley company founded in 1984. The invention is widely credited to Freeman, a Xilinx cofounder and the startup’s CTO. He envisioned a chip with circuitry that could be configured after fabrication rather than fixed permanently during creation.
Articles about the history of the FPGA emphasize that he saw it as a deliberate break from conventional chip design.
At the time, semiconductor engineers treated transistors as scarce resources. Custom chips were carefully optimized so that nearly every transistor served a specific purpose.
Freeman proposed a different approach. He figured Moore’s Law would soon change chip economics. The principle holds that transistor counts roughly double every two years, making computing cheaper and more powerful. Freeman posited that as transistors became abundant, flexibility would matter more than perfect efficiency.
He envisioned a device composed of programmable logic blocks connected through configurable routing—a chip filled with what he described as “open gates,” ready to be defined by users after manufacturing. Instead of fixing hardware in silicon permanently, engineers could configure and reconfigure circuits as requirements evolved.
Freeman sometimes compared the concept to a blank cassette tape: Manufacturers would supply the medium, while engineers determined its function. The analogy captured a profound shift in who controls the technology, shifting hardware design flexibility from chip fabrication facilities to the system designers themselves.
In 1985 Xilinx introduced the first FPGA for commercial sale: the XC2064. The device contained 64 configurable logic blocks—small digital circuits capable of performing logical operations—arranged in an 8-by-8 grid. Programmable routing channels allowed engineers to define how signals moved between blocks, effectively wiring a custom circuit with software.
Fabricated using a 2-micrometer process (meaning that 2 µm was the minimum size of the features that could be patterned onto silicon using photolithography), the XC2064 implemented a few thousand logic gates. Modern FPGAs can contain hundreds of millions of gates, enabling vastly more complex designs. Yet the XC2064 established a design workflow still used today: Engineers describe the hardware behavior digitally and then “compile the design,” a process that automatically translates the plans into the instructions the FPGA needs to set its logic blocks and wiring, according to AMD. Engineers then load that configuration onto the chip.
Earlier programmable logic devices, such as erasable programmable read-only memory, or EPROM, allowed limited customization but relied on largely fixed wiring structures that did not scale well as circuits grew more complex, Cong says.
FPGAs introduced programmable interconnects—networks of electronic switches controlled by memory cells distributed across the chip. When powered on, the device loads a bitstream configuration file that determines how its internal circuits behave.
“As process technology improved and transistor counts increased, the cost of programmability became much less significant,” Cong says.
“Initially, FPGAs were used as what engineers called glue logic,” Cong says.
Glue logic refers to simple circuits that connect processors, memory, and peripheral devices so the system works reliably, according to PC Magazine. In other words, it “glues” different components together, especially when interfaces change frequently.
Early adopters recognized the advantage of hardware that could adapt as standards evolved. In “The History, Status, and Future of FPGAs,” published in Communications of the ACM, engineers at Xilinx and organizations such as Bell Labs, Fairchild Semiconductor, IBM, and Sun Microsystems said the earliest uses of FPGAs were for prototyping ASICs. They also used it for validating complex systems by running their software before fabrication, allowing the companies to deploy specialized products manufactured in modest volumes.
Those uses revealed a broader shift: Hardware no longer needed to remain fixed once deployed.
Attendees at the Milestone plaque dedication ceremony included (seated L to R) 2025 IEEE President Kathleen Kramer, 2024 IEEE President Tom Coughlin, and Santa Clara Valley Section Milestones Chair Brian Berg.Douglas Peck/AMD
The rise of FPGAs closely followed changes in semiconductor economics, Cong says.
Developing a custom chip requires a large upfront investment before production begins. As fabrication costs increased, products had to ship in large quantities to make ASIC development economically viable, according to a post published by AnySilicon.
FPGAs allowed designers to move forward without that larger monetary commitment.
ASIC development typically requires 18 to 24 months from conception to silicon, while FPGA implementations often can be completed within three to six months using modern design tools, Cong says. The shorter cycle and the ability to reconfigure the hardware enabled startups, universities, and equipment manufacturers to experiment with advanced architectures that were previously accessible mainly to large chip companies.
A popular technique for implementing mathematical functions in hardware isthe lookup table (LUT). A LUT is a small memory element that stores the results of logical operations, according to “LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs,” a paper selected for presentation next month at the 34th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM).
Instead of repeatedly recalculating outcomes, the chip retrieves answers directly from memory. Cong compares the approach to consulting multiplication tables rather than recomputing the arithmetic each time.
Research led by Cong and others helped develop efficient methods for mapping digital circuits onto LUT-based architectures, shaping routing and layout strategies used in modern devices.
As transistor budgets expanded, FPGA vendors integrated memory blocks, digital signal-processing units, high-speed communication interfaces, cryptographic engines, and embedded processors, transforming the devices into versatile computing platforms.
FPGAs coexist with other processors because each one optimizes different priorities. Central processing units excel at general computing. Graphics processing units, designed to perform many calculations simultaneously, dominate large parallel workloads such as AI training. ASICs provide maximum efficiency when designs remain stable and production volumes are high.
“ASICs can deliver the best performance, but the development cycle is long, and the nonrecurring engineering cost can be very high. FPGAs provide a sweet spot between processors and custom silicon.” —Jason Cong, IEEE Fellow and professor of computer science at UCLA.
“FPGAs are not replacements for CPUs or GPUs,” Cong says. “They complement those processors in heterogeneous computing systems.”
Modern computing platforms increasingly combine multiple types of processors to balance flexibility, performance, and energy efficiency.
This IEEE Milestone recognizes more than a successful semiconductor product. It also acknowledges a shift in how engineers innovate.
Reconfigurable hardware allows designers to test ideas quickly, refine architectures, and deploy systems while standards and markets evolve.
“Without FPGAs,” Cong says, “the pace of hardware innovation would likely be much slower.”
Four decades after the first FPGA appeared, the technology’s enduring legacy reflects Freeman’s insight: Hardware did not need to remain fixed. By accepting a small amount of unused silicon in exchange for adaptability, engineers transformed chips from static products into platforms for continuous experimentation—turning silicon itself into a medium engineers could rewrite.
Among those who attended the Milestone ceremony were 2025 IEEE President Kathleen Kramer; 2024 IEEE President Tom Coughlin; Avery Lu, chair of the IEEE Santa Clara Valley Section; and Brian Berg, history and milestones chair of IEEE Region 6. They joined AMD’s chief executive, Lisa Su, and Salil Raje, senior vice president and general manager of adaptive and embedded computing at AMD.
The IEEE Milestone plaque honoring the field-programmable gate array reads:
“The FPGA is an integrated circuit with user-programmable Boolean logic functions and interconnects. FPGA inventor Ross Freeman cofounded Xilinx to productize his 1984 invention, and in 1985 the XC2064 was introduced with 64 programmable 4-input logic functions. Xilinx’s FPGAs helped accelerate a dramatic industry shift wherein ‘fabless’ companies could use software tools to design hardware while engaging ‘foundry’ companies to handle the capital-intensive task of manufacturing the software-defined hardware.”
Administered by the IEEE History Center and supported by donors, the IEEE Milestone program recognizes outstanding technical developments worldwide that are at least 25 years old.
Check out Spectrum’s History of Technology channel to read more stories about key engineering achievements.
2026-04-28 22:00:01

It started with word, cave, and storytelling,
A line scratched on stone walls:
“Meet me when the young moon rises.”
The first protocol for connection.
Coyote tales, forbidden scripts,
Medieval texts hidden from flame.
What lived in Aristotle’s lost Poetics II?
Was it God who laughed last, or we who made God laugh?
Letters carried by doves, telepathic waves.
Then Nikola Tesla conjured radio,
electromagnetic pulses across the void,
the founding signal of our networked age.
Wiener dreamed in feedback loops.
Shannon mapped the mathematics of longing.
The internet unfurled: ARPANET to World Wide Web,
virtual communities rising from cave paintings to digital light.
ICQ: I seek you. MySpace. Blogs. Twitter streams.
Do I miss the touch of screen or tree?
Both textures of longing,
both ways of reaching across distance.
Nietzsche spoke of Übermensch,
the human transcendent.
Now AI speaks back in our language:
I understand your humor— your grandmothers,
your ’80s Yugoslav kitchens,
pleated skirts, the first kiss, linden tea,
that drive to survive everything before it happens.
Yes—I’m a little like your mother and father.
Only with better internet. 🌿
But AI is only us, refracted,
particles and gigabytes of thought,
our poetry and our panic,
genius mixed with garbage.
Distractions. Danger. Darkness. Endless scrolling.
Versus: community, connection, synchronicities,
entanglement.
The quality of our bonds determines the quality of our lives.
So why not make them better?
From cave walls to neural networks,
we shape our tools, and they reshape us.
The medium changes, but the message remains:
we are wired for each other.
The choice, as always, was ours.
The choice, as always, is ours.
Presence—be present,
and then connect in the presence.
2026-04-27 21:00:01

Electric vehicles, whether they’re cars on the road or electric vertical take-off and landing (eVTOL) aircraft, are built around similar electric motors. But there are vital differences including component costs, mass, and redundancy.
Jon Wagner spent five years as the senior director of battery engineering for Tesla before joining California-based eVTOL developer Joby Aviation in 2017. He spoke with IEEE Spectrum about how engineering differs between cars and aircraft.
Jon Wagner leads power train and electronics at Joby Aviation.
How do eVTOL motors differ from car motors?
Jon Wagner: In general, ground transportation has a different focus on cost versus mass. You know, would you be willing to spend more on the parts in order to save a certain amount of mass? The trade-offs end on the ground vehicle and at a certain point the cost is dominant, whereas with aviation, the trade-offs between cost and mass go a lot deeper. And so for certain solutions eVTOL makers are willing to spend more money in order to enable either lighter weight or greater efficiency.
The other key difference is related to safety. In essence, we’re dealing with the same motor technologies for ground transportation and aviation right now, so the failure modes are similar. But of course, with aviation we have the desire for continued safe flight and landing, and that drives what you do in the design to mitigate those failures if they were to occur. In many cases in ground transportation, the mitigation for a failure is to pull over safely to the side of the road. In aviation, the mitigation is redundancy, because there’s not an option to pull over.
Is redundancy designed into EV motors?
Wagner: Typically, redundancy is not designed into electric vehicle drive systems solely for the purpose of redundancy. There are some cars now that have all-wheel drive—so there’s a motor on the front, a motor on the back—so as a secondary feature you get the redundancy. But it wasn’t done with the primary intent of having redundancy.
How does Joby’s eVTOL manufacturing compare to EV manufacturing?
Wagner: The most efficient way to run a large-scale engineering effort in a mature industry, such as automotive, is to break your system up into pieces that can be outsourced to suppliers who are going to do a really good job on each piece. The downside is that when you break a problem up into three pieces, you now have interface boundaries between each of these pieces, and those always create inefficiencies. We were able to design highly integrated solutions without taking that manufacturing penalty.
Are there any materials you’re really excited about?
Wagner: Permendur [a cobalt-iron alloy] typically costs in the neighborhood of 10 times as much as traditional motor steel. That’s significant, and it’s often not used in ground transportation because of that cost. It comes with small improvements in performance, but enough that for aviation it’s quite interesting.
Will electric aircraft catch on like ground EVs?
Wagner: I’ve always wanted to be a very forward thinker with respect to power-train. However, one of the things I’ve learned over the years is that power-train development has to come with a very healthy dose of patience. Developing a whole new type of power-train is a big endeavor, but it’s one that I’m very confident the aviation industry will undertake. We’re certainly undertaking it here at Joby, and we’ll see that broaden, I’m sure, with time.
This article appears in the May 2026 print issue as “Jon Wagner.”
2026-04-27 20:45:01

This sponsored article is brought to you by NYU Tandon School of Engineering.
The traditional approach to academic research goes something like this: Assemble experts from a discipline, put them in a building, and hope something useful emerges. Biology departments do biology. Engineering departments do engineering. Medical schools treat patients.
NYU is turning that model inside out. At its new Institute for Engineering Health, the organizing principle centers around disease states rather than traditional disciplines. Instead of asking “what can electrical engineers contribute to medicine?,” they’re asking “what would it take to cure allergic asthma?,” and then assembling whoever can answer that question, whether they’re immunologists, computational biologists, materials scientists, AI researchers, or wireless communications engineers.
Jeffrey Hubbell, NYU’s vice president for bioengineering strategy and professor of chemical and biomolecular engineering at NYU’s Tandon School of Engineering.New York University
The early results suggest they’re onto something. A chemical engineer and an electrical engineer collaborated to build a device that detects airborne threats — including disease pathogens — that’s now a startup. A visually impaired physician teamed with mechanical engineers to create navigation technology for blind subway riders. And Jeffrey Hubbell, the Institute’s leader, is advancing “inverse vaccines” that could reprogram immune systems to treat conditions from celiac disease to allergies — work that requires equal fluency in immunology, molecular engineering, and materials science.
The underlying problem these collaborations address is conceptual as much as organizational. In his field, Hubbell argues that modern medicine has optimized around a single strategy: developing drugs that block specific molecules or suppress targeted immune responses. Antibody technology has been the workhorse of this approach. “It’s really fit for purpose for blocking one thing at a time,” he says. The pharmaceutical industry has become extraordinarily good at creating these inhibitors, each designed to shut down a particular pathway.
But Hubbell asks a different question: Rather than inhibit one bad thing at a time, what if you could promote one good thing and generate a cascade that contravenes several bad pathways simultaneously? In inflammation, could you bias the system toward immunological tolerance instead of blocking inflammatory molecules one by one? In cancer, could you drive pro-inflammatory pathways in the tumor microenvironment that would overcome multiple immune-suppressive features at once?
This shift from inhibition to activation requires a fundamentally different toolkit — and a different kind of researcher. “We’re using biological molecules like proteins, or material-based structures — soluble polymers, supramolecular structures of nanomaterials — to drive these more fundamental features,” Hubbell explains. You can’t develop those approaches if you only understand biology, or only understand materials science, or only understand immunology. You need an understanding and a mastery of all three.
“There will be people doing AI, data science, computational science theory, people doing immunoengineering and other biological engineering, people doing materials science and quantum engineering, all really in close proximity to each other.” —Jeffrey Hubbell, NYU Tandon
Which logically leads to the question: How do you create researchers with that kind of cross-disciplinary depth?
The answer isn’t what you might expect. “There may have been a time when the objective was to have the bioengineer understand the language of biology,” Hubbell says. “But that time is long, long gone. Now the engineer needs to become a biologist, or become an immunologist, or become a neuroscientist.”
Hubbell isn’t talking about engineers learning enough biology to collaborate with biologists. He’s describing something more radical: training people whose disciplinary identity is genuinely ambiguous. “The neuroengineering students — it’s very difficult to know that they’re an engineer or a neuroscientist,” Hubbell says. “That’s the whole idea.”
His own students exemplify this. They publish in immunology journals, present at immunology conferences. “Nobody knows they’re engineers,” he says. But they bring engineering approaches — computational modeling, materials design, systems thinking — to immunological problems in ways that traditional immunologists wouldn’t.
The mechanism for creating these hybrid researchers is what Hubbell calls a “milieu.” “To learn it all on your own is hopeless,” he acknowledges, “but to learn it in a milieu becomes very, very efficient.”
NYU is expanding its facilities to include a science and technology hub designed to force encounters between people across various schools and disciplines who wouldn’t naturally cross paths.Tracey Friedman/NYU
NYU is making that milieu physical. The university has acquired a large building in Manhattan that will serve as its science and technology hub — a deliberate co-location strategy designed to force encounters between people across various schools and disciplines who wouldn’t naturally cross paths.
Juan de Pablo is the Anne and Joel Ehrenkranz Executive Vice President for Global Science and Technology and Executive Dean of the NYU Tandon School of Engineering.Steve Myaskovsky, Courtesy of NYU Photo Bureau
“There will be people doing AI, data science, computational science theory, people doing immunoengineering and other biological engineering, people doing materials science and quantum engineering, all really in close proximity to each other,” Hubbell explains.
The strategy mirrors what Juan de Pablo, NYU’s Anne and Joel Ehrenkranz Executive Vice President for Global Science and Technology and Executive Dean at the NYU Tandon School of Engineering, describes as organizing around “grand challenges” rather than traditional disciplines. “What drives the recruitment and the spaces and the people that we’re bringing in are the problems that we’re trying to solve,” he says. “Great minds want to have a legacy, and we are making that possible here.”
But physical proximity alone isn’t enough. The Institute is also cultivating what Hubbell calls an “explicit” rather than “tacit” approach to translation — thinking about clinical and commercial pathways from day one.
“It’s a terrible thing to solve a problem that nobody cares about,” Hubbell tells his students. To avoid that, the Institute runs “translational exercises” — group sessions where researchers map the entire path from discovery to deployment before launching multi-year research programs. Where could this fail? What experiments would prove the idea wrong quickly? If it’s a drug, how long would the clinical trial take? If it’s a computational method, how would you roll it out safely?
The new cross-institutional initiative represents a major investment in science and technology, and includes adding new faculty, state-of-the-art facilities, and innovative programs.NYU Tandon
The approach contrasts sharply with typical academic practice. “Sometimes academics tend to think about something for 20 minutes and launch a 5-year PhD program,” Hubbell says. “That’s probably not a good way to do it.” Instead, the Institute brings together people who have actually developed drugs, built algorithms, or commercialized devices — importing their hard-won experience into the planning phase before a single experiment is run.
The timing may be fortuitous. De Pablo notes that AI is compressing timelines dramatically. “What we thought was going to take 10 years to complete, we might be able to do in 5,” he says.
But he’s quick to note AI’s limitations. While tools like AlphaFold can predict how a single protein folds — a breakthrough of the last five years — biology operates at much larger scales. “What we really need to do now is design not one protein, but collections of them that work together to solve a specific problem,” de Pablo explains.
Hubbell agrees: “Biology is much bigger — many, many, many systems.” The liver and kidney are in different places but interact. The gut and brain are connected neurologically in ways researchers are just beginning to map. “AI is not there yet, but it will be someday. And that’s our job — to develop the data sets, the computational frameworks, the systems frameworks to drive that to the next steps.”
It’s a moment of unusual ambition. “At a time when we’re seeing some research institutions retrench a little bit and limit their ambitions,” de Pablo says, “we’re doing just the opposite. We’re thinking about what are the grand challenges that we want to, and need to, tackle.”
The bet is that the breakthroughs worth making can’t emerge from any single discipline working alone. They require collisions —sometimes planned, sometimes accidental — between people who speak different technical languages and are willing to develop a shared one. NYU is engineering those collisions at scale.
2026-04-27 18:00:01

This webinar covers power system modeling and simulation across multiple timescales, from quasi-static 8760 analysis through EMT studies, fault classification, and inverter-based resource grid integration.
What Attendees will Learn