AI & Data · Eklipse.gg

AI Labelling Dataset

An end-to-end data pipeline — from gathering raw audio clips and game frames, to annotating emotion, loudness, topic, and sentiment — supporting AI model development at Eklipse.gg.

Data Gathering Audio Labelling Image Labelling YOLO NLP Label Studio Python

Role: AI & Data Intern
Period: 2025 – Present
Company: Eklipse.gg
Status: Ongoing

Overview

This work goes beyond simple labelling — it's a complete end-to-end pipeline from raw data collection all the way to production-ready datasets. Each cycle starts with data gathering, moves through annotation, then passes validation, and loops back to re-annotation whenever quality falls short of the required standard.

There are two parallel tracks running simultaneously: audio data powering the Just Chatting and Voice Command models, and image/frame data feeding the YOLO-based visual detection model. Both tracks are integral parts of the broader AI ecosystem at Eklipse.gg.

Audio Data — Just Chatting & Voice Command

Audio datasets are collected from multiple streaming and short-form video platforms — Twitch clips, YouTube Shorts, and regular YouTube videos. Each clip ranges from 3 to 20 minutes in length, with a strict focus on English-language content to ensure the trained models generalize well across diverse accents, speaking styles, and conversational contexts.

Data Gathering

Collecting short clips from Twitch, YouTube Shorts, and YouTube. Clips are selected based on audio quality, language, and content relevance to the target model category (Just Chatting or Voice Command).

Annotation

Each audio segment is labelled across multiple dimensions: emotion (shouting, laughing, screaming, neutral, etc.), loudness level, topic (game discussion, reaction, commentary), and sentiment (positive, negative, neutral).

Validation & Re-annotation

Datasets go through inter-annotator validation. Inconsistent or ambiguous entries are flagged and returned for re-annotation with updated guidelines, maintaining high annotation reliability throughout the pipeline.

Label Categories

Emotion

Shouting Laughing Screaming Excited Neutral Frustrated

Loudness

Quiet Normal Loud Very Loud

Topic

Just Chatting Game Commentary Reaction Voice Command

Sentiment

Positive Negative Neutral

Image / Frame Data — YOLO UI Detection

Running in parallel with audio work, this track covers the collection and annotation of gameplay video frames. The dataset trains a YOLO model responsible for detecting in-game UI elements and visual context in real-time — identifying what game is being played, what weapons are in use, and what events are unfolding on screen.

Frame Extraction

Frames are extracted from gameplay videos across various game titles. Selection ensures sufficient scene variety — lobby screens, in-game moments, kill feeds, inventory views, and more.

Bounding Box Annotation

Each frame is labelled with bounding boxes around relevant elements: the game being played, the weapon currently in use, in-game events (kill, death, bomb plant, etc.), and other UI elements targeted by the detection model.

Quality Check

Datasets go through a review pass to verify bounding box accuracy and catch any missed or misclassified labels — especially for small or overlapping UI elements where precision matters most.

Detection Targets

Game Type

FPS MOBA Battle Royale RPG

Weapon / Item

Rifle Sniper Pistol Melee

Event

Kill Death Win / Lose Objective

Tools & Workflow

Label Studio

Primary tool for audio annotation — timeline segmentation, multi-label per segment, and a review queue for validation passes.

CVAT / Roboflow

Bounding box annotation for image datasets, with direct export to YOLO-compatible formats ready for model training.

Python Scripts

Automated pipeline for downloading clips via Twitch API and Google API, normalizing metadata, and batch preprocessing before annotation.

Spreadsheet Tracking

Annotation progress, review status, and re-annotation flags are tracked collaboratively via shared spreadsheets across the team.

Key Takeaways

Dataset Language

English

100% English audio coverage

Clip Duration

3–20 min

Per audio clip gathered

Data Tracks

Audio + Image in parallel

Role & Scope

Data type: Audio + Image
Sources: Twitch, YT, Shorts
Language: English
Model target: YOLO + NLP
Location: Remote · Jakarta

Deliverables

Annotated audio dataset (emotion, loudness, topic, sentiment)
YOLO-format image dataset (game, weapon, event detection)
Automated data collection pipeline (Twitch API + Google API)
Annotation validation & re-annotation logs
Topic clustering report for Just Chatting model

Quick Navigation

Audio Labelling Image Labelling

Back to Projects