Customer Profiling (SaaS Edition)
Learn customer profiling for SaaS businesses using behavioral data, clustering algorithms, and feature engineering. Build user segmentation models without ground truth labels.
Start simulating a SaaS business
Simulation Controls
The Problem: Why Profile Customers?
Customer profiling is essential for SaaS businesses to understand their user base, optimize pricing, reduce churn, and improve retention. However, traditional approaches face significant challenges:
The Challenge: No Ground Truth
In real-world scenarios, you don't have reliable labels for customer segments. People lie about their demographics, industry classifications are self-reported and often inaccurate, and survey responses are biased. The only reliable signal comes from user behavior patterns - what customers actually do, not what they say.
This simulator demonstrates how to build customer profiles using behavioral data alone, without relying on self-reported labels.
Who Is This For?
This guide is designed for:
- SaaS Founders & Product Managers: Building subscription businesses and need to understand customer segments
- Data Scientists & ML Engineers: Implementing customer segmentation models using behavioral data and clustering algorithms
- Business Analysts: Analyzing customer behavior patterns and optimizing pricing strategies
- Growth Marketers: Understanding customer churn patterns and retention strategies
- Anyone Learning ML: Wanting to understand feature engineering, clustering algorithms, and model evaluation in a practical SaaS context
Experimental/Dummy Features
The simulator includes several engagement fields (transactions, gmv, hoursSpent, couponsClaimed)
that are randomly generated and do NOT affect simulation behavior. These are included for educational
purposes to demonstrate:
- Feature selection in clustering algorithms
- How algorithms handle noise vs signal
- The importance of choosing relevant features
- False positives in ML model evaluation
The clustering algorithm will attempt to use all features, but noise features should show low correlation with actual customer segments.
How It Works
This simulator models a SaaS business with credit-based subscription tiers. The simulation parameters are designed to reflect real-world customer behavior patterns:
Implementation Guide
Data Collection: User Table and Transaction Table
To implement customer profiling in your SaaS business, you need to collect data that matches the model structure. Here's the data shape your system should capture:
User Table Schema
The user table represents the current state of each customer (snapshot at each time point):
interface User {
userId: string; // Unique identifier
tier: 'free' | 'basic' | 'pro'; // Subscription tier
dateJoined: Date; // When user signed up
dateCancelled: Date | null; // When user cancelled (null if active)
creditsExpired: number; // Cumulative unused credits
industry: string; // Industry segment (if available)
income: number; // Income level (if available)
// Behavioral features (collect from actual usage)
transactions: number; // Transaction count
gmv: number; // Gross merchandise value
hoursSpent: number; // Time spent in platform
couponsClaimed: number; // Engagement metrics
}Transaction Table Schema
The transaction table records monthly activity for all users (historical events):
interface Transaction {
transactionId: string;
userId: string;
type: 'subscription' | 'upgrade' | 'downgrade' | 'refund' | 'credit_usage';
tier: 'free' | 'basic' | 'pro';
amount: number; // Payment amount
creditsUsed: number; // Credits consumed this period
creditsAllocated: number; // Credits allocated this period
date: Date;
monthIndex: number; // Month since start (0-based)
}Feature Engineering: Credits Expired
If your business doesn't have direct behavioral signals, you can engineer features from transaction data. For example:
Credits Expired = creditsAllocated - creditsUsed (accumulated over time)
This feature is particularly valuable because it indicates high-margin customers - users who pay for premium tiers but don't fully utilize their plan. These customers are more profitable and less likely to churn.
Other feature engineering examples:
- Usage Intensity:
creditsUsed / creditsAllocated(percentage utilization) - Payment Consistency: Count of consecutive months with payments
- Growth Rate: Month-over-month change in usage
- Engagement Score: Weighted combination of multiple behavioral signals
Simulation Process: Step-by-Step
The simulation follows a monthly cycle that models real SaaS business operations:
1. Global Configuration
Set global parameters that apply to all users:
- Growth Rate: 3-5% per month (determines how many new users join)
- Simulation Length: 12-24 months
- Seed: For reproducible results
2. Start Users (Initial Cohort)
Generate initial user base with:
- Random assignment to industries based on distribution
- Tier assignment based on industry-specific
subscriptionMixprobabilities - Income sampled from industry-specific ranges
- Important: Tier and industry do NOT change over time (no upgrades, downgrades, or industry transitions)
3. Add User Random Features
Generate dummy/experimental features:
transactions,gmv,hoursSpent,couponsClaimed- These are randomly generated and do NOT affect behavior
- Included to demonstrate feature selection in ML algorithms
4. Start Transaction (Monthly Cycle)
For each active user each month:
-
Calculate Credit Consumption:
- Sample from industry-specific
creditConsumptionrange (e.g., 70-85% for Construction) creditsUsed = creditsAllocated × consumptionPercentcreditsExpired = creditsAllocated - creditsUsed
- Sample from industry-specific
-
Update User State:
- Accumulate
creditsExpired(cumulative over lifetime) - Generate transaction record
- Accumulate
5. Apply User Churn or Behavior Differences
After processing all transactions:
-
Calculate Churn Probability:
- Base rate from industry config
- Multipliers for:
- High credits expired → lower churn (high-margin customer)
- Low activity → higher churn
- Low income → higher churn
-
Process Churn:
- Set
dateCancelledfor users who churn - Churned users no longer generate transactions
- Set
6. End Month
Calculate monthly metrics:
- Revenue (subscription payments)
- Costs (credit usage × credit value)
- Profit (revenue - costs)
- New users, churned users
- Credits expired distribution
7. Start New Month
Repeat steps 4-6 for the next month, with:
- New users added based on growth rate
- Existing users continuing their behavior patterns
- Cumulative metrics accumulating over time
Segmentation Algorithm
The simulator uses k-means clustering to segment users based on behavioral features. However, in practice, you should consider:
Algorithm Selection
- K-Means (used here): Fast, simple, works well with spherical clusters
- Gaussian Mixture Models (GMM): Better for non-spherical clusters, provides probability distributions
- Hierarchical Clustering (HCA): No need to specify k in advance, creates dendrograms for analysis
The K=4 "Cheat"
In this simulator, we cheat by setting k=4 to match the number of industries (Construction, Healthcare, Software, Student).
In practice, you don't know how many segments exist in advance. You would need to:
- Use the elbow method to find optimal k
- Use silhouette analysis to evaluate cluster quality
- Use domain knowledge to validate segment count
- Consider using HCA or GMM which don't require pre-specifying k
Feature Selection
The algorithm evaluates all features but should identify:
- Signal Features:
income,creditsExpired(high importance) - Noise Features:
transactions,gmv,hoursSpent,couponsClaimed(low importance)
This demonstrates the importance of feature selection in ML models - including irrelevant features can reduce model performance.
Has This Helped You?
If you found this guide useful for implementing customer profiling in your SaaS business:
- Share it with your team or on social media
- Bookmark this page for future reference
- Write backlinks to this page when referencing customer segmentation implementation
This simulator provides a complete, production-ready reference implementation for:
- Behavioral data collection
- Feature engineering
- Clustering algorithm evaluation
- Confusion matrix analysis
- Feature importance assessment