The rise and fall (and rise again) of AI: What finally made it all work

10 min read

Last edited:  

The rise and fall (and rise again) of AI: What finally made it all work

AI is no longer science fiction—it’s creating art, powering self-driving cars and holding conversations that feel unsettlingly real. But rewind a few decades, and AI was little more than an overhyped math experiment, struggling to solve even basic logic problems.

The road to today’s AI revolution wasn’t smooth. It was filled with false starts, overblown promises, and brutal crashes. But each AI collapse has been followed by an even bigger resurgence.

So what changed? And why does AI always seem to follow this cycle?

If we don’t understand AI’s past, we can’t predict where it’s headed next. The breakthroughs that seemed small at the time are what made today’s AI explosion possible.

In this episode of The Effortless Podcast, Dheeraj Pandey, Co-Founder and CEO of DevRev, and Amit Prakash, Co-Founder & CTO of ThoughtSpot, break down AI’s biggest inflection points. They’ve witnessed AI’s evolution firsthand—from its early struggles to its most transformative moments. Now, they’re unpacking the hidden patterns behind AI’s history—and what’s coming next.

This is Part 1 of a two-part deep dive into AI’s past, leading up to its modern breakthroughs.

A brief history of AI: Tracing breakthroughs and setbacks

AI has its own unique trajectory of a roller coaster ride, where every 10 or 15 years people would be like, “We’re doing it, we’ve solved it,” and then it kind of falls down and the hype crashes. Then it takes that long to recover from it and then build another level of excitement and new innovations.

Amit PrakashCo-founder & CTO, ThoughtSpot
  • The 1950s and 60s: In the first wave of AI, scientists developed perceptrons that were designed to mimic neurons of the human brain. Optimism was high, until researchers discovered perceptrons couldn’t even solve basic problems like the XOR function, which required multi-layered networks. The hype faded, and progress slowed.
  • The 1970s and 80s: AI entered its first winter as funding dried up and enthusiasm waned. But behind the scenes, researchers developed backpropagation, a breakthrough technique that allowed neural networks to learn from their mistakes. This reignited interest, but AI was still held back by computational limits.
  • The 1990s: Backpropagation began proving its worth, and machine learning emerged as a promising approach. AI could now improve by analyzing data rather than relying on rigid, hand-coded rules. But without the raw power needed to scale, deep learning remained an idea ahead of its time.
  • The 2000s: Enter GPUs—a game-changer. Originally built for rendering graphics, GPUs turned out to be perfect for training neural networks. With their parallel processing capabilities, AI could now handle massive datasets, making deep learning not just possible, but powerful.
  • The 2010s: AI got its fuel—ImageNet, a massive dataset of labeled images. Suddenly, deep learning models could achieve superhuman accuracy in computer vision, enabling facial recognition, autonomous vehicles, and object detection. Meanwhile, Recurrent Neural Networks (RNNs) allowed AI to process language and speech, making translation and text generation possible. But RNNs had a flaw—they struggled with long-term memory.
  • 2017: Google researchers introduced the transformer architecture, a model that eliminated the weaknesses of RNNs. Instead of processing words one at a time, transformers could analyze entire sentences simultaneously, solving AI’s context problem. Within a few years, AI could write essays, translate languages, generate images, and even code, ushering in a new era of intelligence.

Why AI was stuck for decades—until GPUs set it free

Until the early 2000s, AI research was trapped in a bottleneck. The earliest neural networks—perceptrons and later, backpropagation-based models—had theoretical potential, but they couldn’t scale. The problem wasn’t the math. The problem was hardware.

AI required an enormous number of calculations—millions of tiny weight adjustments happening across layers of neurons. But traditional CPUs (Central Processing Units) weren’t built for this kind of workload. CPUs were designed for general-purpose computing, handling a wide variety of tasks sequentially. They could execute complex instructions, but they struggled when asked to perform thousands of simple calculations at once.

As our software started getting more and more complex, people wanted to do more with hardware,” Amit notes.

The AI game changed—not because someone designed better AI hardware, but because the gaming industry needed better graphics. Video games required real-time 3D rendering, which meant constantly calculating how light interacted with objects. This process, called ray tracing, relied on a huge number of matrix multiplications—the same type of calculations that neural networks use.

Amit explains: “When you’re computing the result of projecting a lot of light rays going through 3D geometry, you need to do a lot of matrix multiplications because matrix multiplication essentially gives you a way to rotate things and scale things.

To handle this, NVIDIA and other graphics companies pioneered a different computing philosophy—one that focused on parallelism instead of complexity. Their GPUs (Graphics Processing Units) worked completely differently from CPUs. Instead of a few powerful cores, GPUs had thousands of smaller, simpler cores, each capable of executing the same operation simultaneously across multiple data points.

Traditional CPUs accessed memory in a random, single-cell manner, making them great for handling unpredictable tasks like running operating systems or managing databases. But AI doesn’t work that way. Neural networks operate on structured, matrix-based data, performing the same operation across thousands of numbers at once.

That’s why GPUs, optimized for matrix processing in graphics, became the perfect hardware for AI. GPUs weren’t designed for AI—but they accidentally solved AI’s biggest problem.

And that’s when the AI revolution truly began.

Why AI needed ImageNet to see and RNNs to remember

By the time the 2010s arrived, AI had powerful neural network architectures, a growing interest in deep learning, and—thanks to GPUs—the compute power to process vast amounts of data.

But AI still needed one crucial ingredient: training data. That’s when ImageNet arrived.

ImageNet was a massive dataset of over 14 million labeled images, created by researchers at Stanford. For the first time, AI had access to millions of images, each annotated with what it contained—a dog, cat, car, airplane, or a tree.

Researchers worldwide began training their AI models to classify ImageNet images more accurately than ever before. This head-to-head race rapidly accelerated AI’s ability to recognize objects at human—or even superhuman—levels.

As Amit puts it, ImageNet became “one of the first well-known benchmarks where people went after and tried to show that my algorithm is better than your algorithm in actually predicting these labels.

CNNs: Teaching AI to see

AI researchers had been struggling with computer vision for years. Early attempts required engineers to manually program features—edges, textures, patterns—into algorithms, making it incredibly slow and inflexible.

Convolutional Neural Networks (CNNs) changed everything. Instead of requiring human engineers to define what makes an image of a “dog” different from an image of a “cat,” CNNs could learn these distinctions on their own.

CNNs worked by sliding a small filter (a “kernel”) over an image to detect patterns—edges, curves, textures. This method acted like a sliding window, scanning parts of an image piece by piece.

This allowed CNNs to automatically detect key features in an image, making them far more effective than previous approaches. With CNNs powered by ImageNet, AI could now recognize objects with incredible accuracy.

This was a massive leap forward for AI, but it still had a major limitation: it didn’t understand sequences.

For AI to understand speech, language, and time-dependent patterns, it needed something new.

That’s where Recurrent Neural Networks (RNNs) came in.

RNNs: Giving AI memory

Up until this point, AI models had no memory. They processed one input at a time—whether it was an image, a word, or a sound—without remembering what came before it.

Amit breaks it down: “So far, all the other networks we’ve talked about so far have no notion of memory. Inputs come in, and then output gets produced. And then another input comes in and another output gets produced. It has no memory of what it did before.

This was fine for static tasks like recognizing an image, but it failed completely for anything sequential—like translating a sentence or predicting stock prices.

It’s relevant for natural language because you have a long sequence of words and you cannot take the whole long sequence of words and feed it into a neural net,” Amit explains.

Recurrent Neural Networks (RNNs) solved this problem by introducing recursion. Instead of treating each input separately, RNNs fed their output back into themselves, allowing them to remember past inputs.

This was huge for language processing. AI could now process text like a human does—word by word, sentence by sentence—while remembering context.

The problem that made RNNs a dead-end for deep learning

RNNs were a breakthrough, but they weren’t perfect. As sequences got longer, RNNs struggled to retain memory. This was known as the problem of vanishing and exploding gradients, Amit went on to explain.

In technical terms, the gradient is what helps a neural network learn. When a network adjusts itself based on an error, it updates its weights using a process called backpropagation.

But RNNs had a problem:

  • If gradients got too small, the model forgot earlier information—leading to the vanishing gradient problem.
  • If gradients got too big, the model became unstable—leading to the exploding gradient problem.

Amit gives a real-world example: “Imagine a passage of text that says, ‘George Bush was the 43rd President of the United States. He was also governor of Texas.’ How do you understand the meaning of ‘he’ without connecting it to George Bush over here, which is several words apart?”

Traditional RNNs struggled with this kind of long-distance relationship between words. AI could remember short-term context, but anything beyond a few words? It was lost.

This was a massive roadblock. AI needed memory—not just over a few words, but entire conversations, books, or even longer sequences.

RNNs were a critical step, but they weren’t the final solution. The vanishing gradient problem meant that RNNs could never scale to truly understand complex language structures.

Something new was needed. And in 2017, that something arrived.

It was called the Transformer. And it changed AI forever.

Stay tuned to Part 2 of this episode of the Effortless Podcast, where Dheeraj and Amit will discuss the journey of AI from 2017 to the present, including Attention Is All You Need, the landmark research paper from Google that introduced transformers to the world.

If you want to stay ahead of the curve on AI and the future of technology, explore The Effortless Podcast Substack. From groundbreaking innovations to the next big paradigm shifts, we break down the forces shaping tomorrow’s intelligent systems.



Akileish Ramanathan
Akileish RamanathanMarketing at DevRev

A content marketer with a journalist's heart, Akileish enjoys crafting valuable content that helps the audience separate signal from noise.