AI & Security · Web Tool

Judol Scanner — AI Gambling Injection Detector

An AI-powered web scanner that detects online gambling content injections on compromised websites — supporting both general websites via HTML UI extraction and WordPress sites via the WP-JSON REST API.

Python Flask NLP / AI Model BeautifulSoup WP-JSON API HTML Parsing
Judol Scanner Cover
Role
Full-stack + AI Dev
Period
2025 – Present
Target
WordPress & General
Status
In Progress

Background

Online gambling content injection — locally known as judol — is one of the most widespread website compromise attacks in Indonesia. Attackers silently inject gambling keywords, links, and UI elements into legitimate websites, often without the site owner ever noticing. Government sites, university portals, and small business pages are frequent targets.

Existing tools rely on simple keyword blacklists that are trivially bypassed by obfuscated text or dynamically injected content. This project builds a smarter, AI-powered scanner that understands context — not just matching strings — and covers both WordPress and general websites through two dedicated detection strategies.

Pain Points
  • Injections often go undetected for months
  • Keyword blacklists bypass-able by obfuscation
  • No unified tool for both WP and non-WP sites
Goals
  • Context-aware AI detection, not just keywords
  • Two scan modes: General + WordPress-specific
  • Simple URL-input interface, no tech expertise needed

Scan Modes

The scanner operates in two distinct modes depending on the target website's platform. Each mode uses a different data extraction strategy to maximize detection coverage.

General Mode
Any website

Works universally on any website regardless of platform. The scanner fetches the page HTML, then parses and extracts the rendered UI text — headings, paragraphs, link anchors, button labels, and visible content blocks — stripping away structural tags to isolate the actual human-readable content that would appear to visitors.

Works on any CMS or static site
Detects visible injected UI elements
Extracts rendered text via HTML parsing
WordPress Mode
WP-JSON REST API

Tailored specifically for WordPress sites. Instead of parsing rendered HTML, this mode queries the native WP-JSON REST API to pull raw post and page content directly from the database layer — catching injections that may be hidden from the rendered view but still present in the stored content.

Scans posts, pages, and custom types
Catches hidden/non-rendered injections
Deeper coverage via database-level content

How It Works

Input
Target URL
Detect
Platform Check
Extract
HTML / WP-JSON
Analyse
AI Model
Output
Scan Report
1
URL Input & Platform Detection

User submits a target URL. The system auto-detects whether the site runs on WordPress by probing the /wp-json/ endpoint. Based on the result, it routes to the appropriate scan mode automatically — no manual selection required.

2
Content Extraction

General: Fetches raw HTML and uses BeautifulSoup to extract visible UI text — stripping scripts, styles, and structural tags to isolate what users actually see on the page.

WordPress: Calls the WP-JSON REST API to retrieve post and page content objects directly, providing access to raw stored content that may differ from the rendered view.

3
AI-Powered Classification

Extracted content is passed through the AI model, which classifies whether gambling injection is detected. Unlike keyword blacklists, the model understands context — it recognises obfuscated terms, gambling-adjacent language patterns, and injection signatures that simple string matching would miss.

4
Scan Report

The scanner returns a structured result — detection verdict (clean / infected), confidence score, flagged content excerpts, and the specific pages or posts where injections were found. WordPress scans include per-post granularity so site owners know exactly which content was compromised.

What Gets Detected

The model is trained to recognise a range of gambling injection signatures commonly used in Indonesian online gambling spam attacks.

Gambling Keywords

Slot, togel, casino, jackpot, situs judi, and related terms — including obfuscated and leet-speak variants.

Injected Links

Hidden anchor tags pointing to gambling domains, often buried in page footers or injected into existing content blocks.

UI Injections

Injected banners, buttons, and promotional text blocks that render visually on the page but are out of place with the site's original content.

Content Pollution

Gambling-related articles or posts injected into the WordPress database — designed to boost SEO for gambling sites while damaging the host site's reputation.

Results

Scan Modes
2
General + WordPress
Detection Method
AI
Context-aware, not keyword-only
Input Required
URL
No technical setup needed