A Historical Timeline of AI/ML Research
An interactive timeline of seminal research papers that tracks the evolution of ideas and breakthroughs, from the field's conceptual beginnings to today's state-of-the-art technology.
Computing Machinery and Intelligence
The philosophical starting point for the field. It introduced the "Turing Test" as a benchmark for machine intelligence and framed the core question: "Can machines think?"
Read PaperThe Perceptron: A Probabilistic Model...
Introduced the Perceptron, the first artificial neural network. It was a simple, single-layer model that could learn from data, laying the conceptual groundwork for modern deep learning.
Read PaperLearning Representations by Back-Propagating Errors
Popularized the backpropagation algorithm, an efficient method for training multi-layered neural networks. This breakthrough solved a major hurdle and enabled the development of much deeper networks.
Read PaperLong Short-Term Memory
Introduced the Long Short-Term Memory (LSTM) architecture, an RNN variant that uses special gates to manage memory, allowing it to learn long-range dependencies in sequential data.
Read PaperGradient-Based Learning Applied to Document Recognition
Introduced LeNet-5, a pioneering Convolutional Neural Network (CNN) that set the standard for image recognition tasks and proved the effectiveness of the CNN architecture.
Read PaperA Neural Probabilistic Language Model
A foundational NLP paper that proposed learning a distributed representation for words (word embeddings) simultaneously with a language model, a concept that now underlies all of modern NLP.
Read PaperImageNet Classification with Deep CNNs
Introduced "AlexNet," a deep CNN that won the 2012 ImageNet competition by a landslide. Its success, powered by GPUs, marked the "big bang" moment that brought deep learning into the mainstream.
Read PaperDistributed Representations of Words and Phrases...
Introduced Word2Vec, a highly efficient toolkit for learning word embeddings from raw text. It democratized the use of embeddings and became a standard for many NLP tasks.
Read PaperGenerative Adversarial Networks
Introduced a novel framework where two neural networks, a generator and a discriminator, compete against each other. GANs revolutionized generative modeling and image synthesis.
Read PaperDeep Residual Learning for Image Recognition
Introduced the Residual Network (ResNet), an architecture that uses "skip connections" to solve the problem of training very deep networks. It enabled networks of hundreds of layers, setting new accuracy records.
Read PaperAttention Is All You Need
A landmark paper that introduced the Transformer, an architecture based solely on a "self-attention" mechanism. It enabled massive parallelization, becoming the foundation for nearly all modern LLMs.
Read PaperBERT: Pre-training of Deep Bidirectional Transformers...
Introduced BERT, which revolutionized NLP by using a Transformer to pre-train a model on vast unlabeled text. This model could then be quickly fine-tuned, achieving state-of-the-art results.
Read PaperLanguage Models are Few-Shot Learners
Introduced GPT-3, a 175B parameter model that demonstrated "few-shot" learning. It showed that by massively scaling up Transformers, new capabilities emerge without explicit training.
Read PaperHighly accurate protein structure prediction with AlphaFold
A monumental scientific achievement. The AlphaFold 2 system used deep learning to predict the 3D structure of proteins, solving a 50-year-old grand challenge in biology.
Read PaperHigh-Resolution Image Synthesis with Latent Diffusion Models
Introduced Latent Diffusion, the core technology behind Stable Diffusion. It made high-resolution text-to-image generation efficient and accessible, sparking a creative explosion.
Read PaperChain-of-Thought Prompting Elicits Reasoning in LLMs
A pivotal discovery showing that LLM reasoning could be improved by prompting them to "think step-by-step." This CoT technique launched a new subfield of prompt engineering.
Read PaperLLaMA: Open and Efficient Foundation Language Models
Meta released LLaMA, a family of highly performant models with open weights. This move democratized access to powerful LLMs, sparking a massive wave of open-source innovation.
Read PaperSparks of Artificial General Intelligence: Early experiments with GPT-4
This paper argued that GPT-4 demonstrated sparks of AGI, showing surprising capabilities in reasoning, planning, and creativity that were not explicitly trained for.
Read PaperGemini: A Family of Highly Capable Multimodal Models
Introduced Google's Gemini, a family of models built from the ground up to be natively multimodal, seamlessly understanding and reasoning across text, images, audio, and video.
Read PaperMixture-of-Experts Meets Instruction Tuning
This work (and others like Mixtral 8x7B) popularized the Mixture-of-Experts (MoE) architecture, achieving the performance of much larger models with a fraction of the computational cost.
Read PaperTrend: Rise of Autonomous Agentic Architectures
Represents a cluster of research focused on agentic frameworks that give LLMs capabilities like long-term planning, memory, and tool use to accomplish complex goals autonomously.
Read More