Forbidden word detection at scale using Aho–Corasick in go

Introduction

In digital environments where users constantly generate content names, comments, messages or form entries detecting forbidden words (NG words) is essential to ensure compliance, security and a consistent experience. From a product and platform perspective, this is not only a functional problem but an operational one: it must be solved in real time without degrading system performance. This post presents a solution based on the Aho–Corasick algorithm to optimize forbidden-word detection in a real system, showing how to move from a simple but costly implementation to an efficient, scalable architecture.

The problem: a simple but costly detection approach

A basic prohibited-word detection implementation typically iterates the entire list of blocked terms and checks whether any appear in the input text.

Although easy to implement, the worst-case complexity of this approach is:

O(n × m × k)
n: length of the text
m: average length of the forbidden words
k: number of words in the list

With high-traffic systems or long lists, this approach becomes infeasible from a performance standpoint.

Functional requirements

From a product point of view the system must allow: Deny List: words that must be blocked if they appear in the text Allow List: explicit exceptions that override blocks Real-time evaluation without impacting latency

Deterministic behavior and ease of debugging from the admin panel

Aho–Corasick: efficient multi-pattern detection

The Aho–Corasick algorithm enables simultaneous search for multiple patterns in a text with search complexity O(n), independent of the number of forbidden words.

The process is divided into three stages:

Construction of the Trie

All deny and allow words are stored in a Trie structure, sharing common prefixes to reduce redundancy.

Construction of failure transitions

Each Trie node defines an alternative transition that lets the algorithm continue searching when there is no direct match, avoiding costly backtracking and preserving linear flow.

Search

The text is scanned character by character. The Trie and failure transitions allow full matches to be detected without restarting the analysis.

Implementation in Go

The implementation uses a node structure that maintains:

Parent–child relationships
End-of-word indicators
Flags for deny and allow
References to failure nodes

During service initialization:

Build the Trie from the configured lists
Compute failure transitions
The system becomes ready to evaluate inputs with linear complexity

During execution:

Each text is analyzed character by character
Allow-list rules have priority over deny-list rules
The result indicates whether the text should be blocked or permitted

Impact on real systems

From a Meetlabs perspective, this approach brings:

Scalability: evaluation cost does not grow with the number of words
Low latency: ideal for synchronous validations
Consistency: clear rules even with exceptions
Stable user experience under high load

This optimization is critical for products where moderation happens on every user interaction.

Recommendations

Design clear allow/deny rules
Preprocess patterns at service startup
Prioritize O(n) search strategies
Separate configuration from logic
Treat moderation as part of the product, not just validation

Conclusion

Using the Aho Corasick algorithm for forbidden-word detection transforms a seemingly simple problem into a robust, efficient, and scalable solution. By shifting complexity to initialization and keeping searches linear in time, it is possible to guarantee real-time moderation without compromising system performance or user experience.

Glossary

Trie: a tree-like structure to store and search strings efficiently
Aho–Corasick: an algorithm for simultaneous multi-pattern search
Failure transition: an alternative jump that avoids restarting the search
Deny List: list of explicitly forbidden terms
Allow List: exceptions that override block rules

Table of Contents