02/03/2026

Beyond CTR: The Complete map of data science in advertising

Data science in advertising goes far beyond predicting clicks. Over the last 15 years the field has evolved from simple CTR models into complex systems that integrate pricing, optimization under constraints, reinforcement learning, experimentation, and more recently the application of large language models (LLMs). This article organizes that technical journey into a clear map: what problems were solved first, how different components connect, and which challenges remain. The goal is not to list papers but to provide a structured view that helps distinguish between trend and foundation, and to show where AdTech is heading.

Introduction

When someone hears data science in advertising, they often think of models that predict CTR or CVR. And yes that was the starting point. But the reality is much broader. Digital advertising combines prediction, real-time auctions, budget control, creative optimization, experimentation, and now even multimodal models. Understanding this entire ecosystem is difficult because knowledge is fragmented across academic papers, practical implementations, and internal systems that evolve over time. This article aims to organize that map.

Background: the problem of fragmentation

Research in digital advertising was particularly active between 2010 and 2015; many foundational papers were born during that period.

The problem today is not a lack of information, but the opposite:

There are foundational papers that remain relevant.
There are modern techniques that extend those approaches.
There are recent studies that are not always production-ready.
And now LLMs have entered the picture.

For newcomers, distinguishing between mature technology, current trends, and hype is a real challenge.

Core idea

Data science in advertising can be understood as an evolution in layers:

Prediction (CTR, CVR)
Using predictions in auctions (pricing and bidding)
Dynamic optimization under constraints
Reinforcement learning and multi-agent control
Creativity, experimentation, and privacy

These are not isolated topics they are parts of the same system.

The starting point: predictive models

The first major wave focused on prediction

CTR models based on logistic regression.
Evolution toward FFM, DeepFM and DeepFFM.
CVR modeling and handling delayed conversions.
Selection bias correction.

Although they may seem old, many of these models remain competitive because of:

Latency constraints in DSPs.
Inference costs.
Need for stability and robustness.

This is where everything begins but it is not where everything ends.

From predicting to deciding: pricing and auctions

Once we can estimate probabilities, the key question arises: how do we use those predictions to bid better?

This brings in topics such as:

First-price auctions and bid shading.
Win-rate prediction and bid landscape modeling.
Relationship between expected value and bidding strategy.

Digital advertising is not only prediction; it’s real-time strategic competition.

Optimization under constraints: the budget problem

Advertising systems operate under clear limits:

Daily budget.
Temporal distribution.
Performance objectives.

This led to:

Pacing models.
Feedback control approaches.
Dynamic optimization.
Reinforcement learning applied to bidding.

Here data science stops being purely static and becomes a sequential decision problem.

RL in multi-agent environments and strategic competition

When multiple advertisers compete simultaneously, the problem becomes strategic rather than only predictive. The environment turns dynamic: each campaign learns, adjusts, and responds to others’ decisions. This is where multi-agent reinforcement learning (MARL) becomes relevant — you can no longer just estimate probabilities; you must act while considering that the system is constantly changing.

Key aspects:

Multi-agent RL applied to auctions.
Strategic interactions between campaigns that are also optimizing.
Unification of prediction and control.
Decision-making in dynamic systems rather than static scenarios.

Creativity, evaluation, and the new technological front

Beyond bidding, performance depends on creativity, rigorous experimentation, and adaptation to a new privacy ecosystem. Optimization is not only technical but also methodological. Without solid evaluation, no model matters.

Important threads:

Creative optimization using methods such as Thompson Sampling.
Handling non-stationarity and continuous testing.
A/B testing with bias correction and budget constraints.
Cookie deprecation and regulatory changes.
Use of LLMs in content moderation, multimodal co-embeddings, and automation of creative review.

Recommendations

Don’t underestimate foundational papers: many remain relevant in production.
Understand how prediction and control connect before diving into advanced RL.
Always evaluate practical applicability before adopting new techniques.
Prioritize systems that are robust under real latency and cost constraints.
Watch progressive integration of LLMs, but don’t lose sight of fundamentals.

Conclusions

Data science in advertising is not a collection of isolated techniques but an interconnected system that has evolved from simple predictive models to complex architectures of decision-making, control, and experimentation. Understanding this full map makes it possible to distinguish between fashion and foundation, between real innovation and unnecessary complexity. Beyond CTR, the real challenge lies in integrating prediction, strategy, and evaluation in a dynamic, competitive, and increasingly regulated environment.

Glossary

CTR (Click Through Rate): clicks per impression.
CVR (Conversion Rate): conversion rate after a click.
Bid shading: strategy to adjust bids in first-price auctions.
Pacing: controlling budget spend over time.
Reinforcement Learning: learning technique based on sequential decisions and rewards.