Customer Profiling (Gacha Business Edition)
Learn customer profiling for gacha games using behavioral data, player segmentation, and clustering algorithms. Build player segmentation models to identify whales, sharks, and dolphins from spending patterns.
Start simulating a Gacha Business
Simulation Controls
The Problem: Why Profile Players?
Customer profiling is essential for gacha game businesses to understand their player base, optimize monetization strategies, reduce churn, and maximize lifetime value (LTV). However, unlike SaaS businesses with subscription tiers, gacha games face unique challenges:
The Challenge: No Ground Truth Labels
In real-world scenarios, you don't have reliable labels for player segments. Players don't self-identify as "whales" or "dolphins" - these segments are derived from behavioral patterns observed through transactions and gameplay. The only reliable signal comes from player behavior patterns - what players actually do (spending, pass purchases, engagement), not what they say.
This simulator demonstrates how to build player profiles using behavioral data alone, identifying spending patterns that correlate with player types without relying on self-reported labels.
Who Is This For?
This guide is designed for:
- Game Developers & Product Managers: Building gacha games and need to understand player monetization patterns
- Data Scientists & ML Engineers: Implementing player segmentation models using behavioral data
- Business Analysts: Analyzing player spending patterns and optimizing revenue strategies
- Game Designers: Understanding how different player types (whales, sharks, dolphins) behave and monetize
- Anyone Learning ML: Wanting to understand feature engineering, clustering algorithms, and model evaluation in a practical context
Random/Dummy Features
The simulator includes several features (totalRevenue, hoursSpent, age, engagementScore, promotionResponseRate)
that are randomly generated and do NOT affect simulation behavior. These are included for educational
purposes to demonstrate:
- Feature selection in clustering algorithms
- How algorithms handle noise vs signal
- The importance of choosing relevant behavioral features
- Why transaction-based features outperform demographic features
The clustering algorithm will attempt to use all features, but behavioral features (top-up count, pass purchases) should show high importance, while random features should show low importance.
How It Works
This simulator models a gacha game business with different player types and monetization strategies. Unlike SaaS businesses with subscription tiers, gacha games use a freemium model with multiple revenue streams:
Implementation Guide
Data Collection: Player Table and Transaction Table
To implement player profiling in your gacha game, you need to collect data that matches the model structure. Here's the data shape your system should capture:
Player Table Schema
The player table represents the current state of each player (snapshot at each time point):
interface Player {
playerId: string; // Unique identifier
playerType: 'whale' | 'shark' | 'dolphin' | 'f2p'; // Derived from behavior (not stored initially)
joinDate: Date; // When player registered
dateChurned: Date | null; // When player quit (null if active)
totalRevenue: number; // Cumulative revenue from this player
// Pass subscriptions (current state)
monthlyPassActive: boolean; // Currently subscribed to monthly pass
battlePassActive: boolean; // Currently subscribed to battle pass
// Behavioral features (for clustering - derived from transactions)
// In practice, these are calculated from transaction history:
topUpCount: number; // Number of top-up transactions
avgTopUpAmount: number; // Average amount per top-up
battlePassCount: number; // Number of battle pass purchases
monthlyPassCount: number; // Number of monthly pass subscriptions
monthsActive: number; // Time since join (engagement duration)
}Transaction Table Schema
The transaction table records all player spending events (historical events):
interface Transaction {
transactionId: string;
playerId: string;
type: 'topup' | 'battlepass' | 'monthlypass';
amount: number; // Payment amount (real money)
currency: number; // In-game currency received
date: Date;
monthIndex: number; // Month since start (0-based)
}Feature Engineering: Behavioral Features from Transactions
Unlike SaaS businesses that can use subscription tiers as labels, gacha games must derive player segments from behavioral patterns. Here's how to engineer features from transaction data:
Behavioral Features (Signals)
- Top-up Count:
COUNT(transactions WHERE type = 'topup')- Frequency of spending - Average Top-up Amount:
AVG(amount WHERE type = 'topup')- Spending intensity - Battle Pass Count:
COUNT(transactions WHERE type = 'battlepass')- Event-driven engagement - Monthly Pass Count:
COUNT(transactions WHERE type = 'monthlypass')- Subscription commitment - Months Active:
MONTHS_BETWEEN(joinDate, currentDate)- Engagement duration
These features should correlate with player types:
- Whales: High top-up count, high average amount, high pass counts
- Sharks: Moderate top-up, high monthly pass count, low battle pass count
- Dolphins: Low top-up count, occasional pass purchases
Why Not Use Revenue Directly?
Total revenue is an outcome, not a predictor. By the time you know a player's total revenue, you've already identified their segment. Behavioral features (top-up frequency, pass adoption patterns) are observable early and can predict future spending behavior.
Other feature engineering examples:
- Pass Consistency: Count of consecutive months with monthly pass
- Top-up Frequency: Average time between top-ups
- Seasonal Patterns: Battle pass purchases per season
- Engagement Score: Weighted combination of login frequency, playtime, social features
Simulation Process: Step-by-Step
The simulation follows a monthly cycle that models real gacha game operations:
1. Global Configuration
Set global parameters that apply to all players:
- Growth Rate: 4% per month (determines how many new players join)
- Player Type Probabilities: Whale (1%), Shark (5%), Dolphin (10%)
- Simulation Length: 12-24 months
- Seed: For reproducible results
2. Start Players (Initial Cohort)
Generate initial player base with:
- Random assignment to player types based on probabilities
- Important: Player type does NOT change over time (no upgrades/downgrades)
- Only paying players (Whale, Shark, Dolphin) are generated (F2P excluded)
3. Start Transaction (Monthly Cycle)
For each active player each month:
-
Monthly Pass Check:
- Sample from player type's
monthlyPassProbabilityrange - If purchased, generate transaction and activate pass
- Pass provides daily login rewards (not modeled in detail)
- Sample from player type's
-
Battle Pass Check:
- Sample from player type's
battlePassProbabilityrange - If purchased, generate transaction and activate pass
- Pass provides tiered rewards (not modeled in detail)
- Sample from player type's
-
Top-up Check:
- Sample from player type's
topUpProbabilityrange - If top-up occurs, sample amount from
topUpAmountrange - Generate transaction record
- Sample from player type's
4. Apply Player Churn
After processing all transactions:
-
Calculate Churn Probability:
- Base rate from player type config (Whale: 5-10%, Shark: 1-2%, Dolphin: 2-5%)
- Sample from range for this month
-
Process Churn:
- Set
dateChurnedfor players who churn - Churned players no longer generate transactions
- Set
5. End Month
Calculate monthly metrics:
- Revenue by source (top-ups, monthly pass, battle pass)
- Revenue by player type
- Active player counts by type
- New players, churned players
6. Start New Month
Repeat steps 3-5 for the next month, with:
- New players added based on growth rate
- Existing players continuing their behavior patterns
- Cumulative metrics accumulating over time
Segmentation Algorithm
The simulator uses k-means clustering to segment players based on behavioral features derived from transactions. However, in practice, you should consider:
Algorithm Selection
- K-Means (used here): Fast, simple, works well with spherical clusters. We use
k=3to match the three paying player types (Whale, Shark, Dolphin). - Gaussian Mixture Models (GMM): Better for non-spherical clusters, provides probability distributions
- Hierarchical Clustering (HCA): No need to specify k in advance, creates dendrograms for analysis
The K=3 "Cheat"
In this simulator, we cheat by setting k=3 to match the number of paying player types (Whale, Shark, Dolphin).
In practice, you don't know how many segments exist in advance. You would need to:
- Use the elbow method to find optimal k
- Use silhouette analysis to evaluate cluster quality
- Use domain knowledge to validate segment count (e.g., industry standard: 3-5 segments)
- Consider using HCA or GMM which don't require pre-specifying k
Feature Selection
The algorithm evaluates all features but should identify:
- Signal Features (high importance):
topUpCount,avgTopUpAmount,battlePassCount,monthlyPassCount,monthsActive- These are behavioral features derived from transactions - Noise Features (low importance):
totalRevenue,hoursSpent,age,engagementScore,promotionResponseRate- These are random/dummy features that don't correlate with player type
This demonstrates the importance of feature selection in ML models - transaction-based behavioral features should outperform demographic or engagement metrics for predicting spending segments.
Why Behavioral Features Work
Behavioral features from transactions are superior because:
- Observable Early: You can calculate top-up frequency and pass adoption patterns within the first few months
- Directly Correlate: Spending behavior directly maps to player type definitions
- Actionable: You can use these patterns to predict future spending and optimize marketing
- No Self-Reporting Bias: Derived from actual transactions, not surveys or demographics
Has This Helped You?
If you found this guide useful for implementing player profiling in your gacha game:
- Share it with your team or on social media
- Bookmark this page for future reference
- Write backlinks to this page when referencing player segmentation implementation
This simulator provides a complete, production-ready reference implementation for:
- Behavioral data collection from transactions
- Feature engineering for player segmentation
- Clustering algorithm evaluation
- Confusion matrix analysis
- Feature importance assessment (signal vs noise)