AI & Security · Web Tool

Judol Scanner — AI Gambling Injection Detector

An AI-powered web scanner that detects online gambling content injections on compromised websites — supporting both general websites via HTML UI extraction and WordPress sites via the WP-JSON REST API.

How It Works Scan Modes

Python Flask NLP / AI Model BeautifulSoup WP-JSON API HTML Parsing

Role: Full-stack + AI Dev
Period: 2025 – Present
Target: WordPress & General
Status: In Progress

Background

Online gambling content injection — locally known as judol — is one of the most widespread website compromise attacks in Indonesia. Attackers silently inject gambling keywords, links, and UI elements into legitimate websites, often without the site owner ever noticing. Government sites, university portals, and small business pages are frequent targets.

Existing tools rely on simple keyword blacklists that are trivially bypassed by obfuscated text or dynamically injected content. This project builds a smarter, AI-powered scanner that understands context — not just matching strings — and covers both WordPress and general websites through two dedicated detection strategies.

Pain Points

Injections often go undetected for months
Keyword blacklists bypass-able by obfuscation
No unified tool for both WP and non-WP sites

Goals

Context-aware AI detection, not just keywords
Two scan modes: General + WordPress-specific
Simple URL-input interface, no tech expertise needed

Scan Modes

The scanner operates in two distinct modes depending on the target website's platform. Each mode uses a different data extraction strategy to maximize detection coverage.

General Mode

Any website

Works universally on any website regardless of platform. The scanner fetches the page HTML, then parses and extracts the rendered UI text — headings, paragraphs, link anchors, button labels, and visible content blocks — stripping away structural tags to isolate the actual human-readable content that would appear to visitors.

Works on any CMS or static site

Detects visible injected UI elements

Extracts rendered text via HTML parsing

WordPress Mode

WP-JSON REST API

Tailored specifically for WordPress sites. Instead of parsing rendered HTML, this mode queries the native WP-JSON REST API to pull raw post and page content directly from the database layer — catching injections that may be hidden from the rendered view but still present in the stored content.

Scans posts, pages, and custom types

Catches hidden/non-rendered injections

Deeper coverage via database-level content

How It Works

Input

Target URL

Detect

Platform Check

Extract

HTML / WP-JSON

Analyse

AI Model

Output

Scan Report

URL Input & Platform Detection

User submits a target URL. The system auto-detects whether the site runs on WordPress by probing the /wp-json/ endpoint. Based on the result, it routes to the appropriate scan mode automatically — no manual selection required.

Content Extraction

General: Fetches raw HTML and uses BeautifulSoup to extract visible UI text — stripping scripts, styles, and structural tags to isolate what users actually see on the page.

WordPress: Calls the WP-JSON REST API to retrieve post and page content objects directly, providing access to raw stored content that may differ from the rendered view.

AI-Powered Classification

Extracted content is passed through the AI model, which classifies whether gambling injection is detected. Unlike keyword blacklists, the model understands context — it recognises obfuscated terms, gambling-adjacent language patterns, and injection signatures that simple string matching would miss.

Scan Report

The scanner returns a structured result — detection verdict (clean / infected), confidence score, flagged content excerpts, and the specific pages or posts where injections were found. WordPress scans include per-post granularity so site owners know exactly which content was compromised.

What Gets Detected

The model is trained to recognise a range of gambling injection signatures commonly used in Indonesian online gambling spam attacks.

Gambling Keywords

Slot, togel, casino, jackpot, situs judi, and related terms — including obfuscated and leet-speak variants.

Injected Links

Hidden anchor tags pointing to gambling domains, often buried in page footers or injected into existing content blocks.

UI Injections

Injected banners, buttons, and promotional text blocks that render visually on the page but are out of place with the site's original content.

Content Pollution

Gambling-related articles or posts injected into the WordPress database — designed to boost SEO for gambling sites while damaging the host site's reputation.

Results

Scan Modes

General + WordPress

Detection Method

Context-aware, not keyword-only

Input Required

URL

No technical setup needed

Tech Stack

Backend: Flask (Python)
HTML Parsing: BeautifulSoup
WP Integration: WP-JSON REST API
AI Model: NLP Classifier
Interface: Web App

Deliverables

AI classification model for gambling injection
General mode — HTML UI text extractor
WordPress mode — WP-JSON content scanner
Auto platform detection on URL input
Scan report with flagged content excerpts
Flask web interface for non-technical users

Quick Navigation

Scan Modes How It Works

Back to Projects