SegMark Engine: Hybrid RFM-XGBoost Customer Segmentation for D2C Growth

author
By Adarsh Keshri

7/15/2025, 10:30:00 AM

image

Introduction

Customer acquisition costs are rising fast. For D2C brands, the path to sustainable growth runs through retention, not acquisition. The SegMark Engine is Cosminder's answer—a hybrid segmentation and retargeting system that combines classical RFM analysis with predictive XGBoost modeling to identify, score, and activate high-value customer segments at scale.

Built on 120,000 transaction and behavioral logs from a D2C skincare brand, SegMark delivered a 64.7% uplift in repeat purchase rate and ₹22L in incremental net revenue within six weeks.

The D2C Retention Problem

The Direct-to-Consumer model gives brands direct access to first-party data—transactions, browsing behavior, engagement signals. But most brands underutilize this asset, relying on generic campaigns that treat every customer the same.

As competition intensifies and CAC climbs, the math is clear:

  • Acquiring new customers is getting more expensive every quarter.
  • Retaining existing ones is 5–7x more cost-efficient.
  • Personalized outreach drives measurably higher conversion rates.

SegMark was built to turn raw customer data into a precision retention engine.

Phase 1: RFM Scoring — Understanding Past Behavior

RFM analysis segments customers along three behavioral dimensions:

  • Recency (R): Days since last purchase. Recent buyers are more engaged and responsive.
  • Frequency (F): Total number of orders. Repeat buyers signal loyalty and satisfaction.
  • Monetary (M): Total lifetime spend. High spenders warrant premium attention.

Each customer is scored on a 1–5 scale using quintile binning across all three dimensions. This produces intuitive segments like Champions (R=5, F=5, M=5), Potential Loyalists (recent but infrequent), and At-Risk (once-valuable customers who've gone quiet).

RFM gives a clean snapshot of historical value—but it's backward-looking. It tells you who was valuable, not who will be.

Phase 2: XGBoost — Predicting Future Behavior

To shift from descriptive to predictive analytics, SegMark layers an XGBoost classifier on top of the RFM scores. XGBoost (Extreme Gradient Boosting) is an ensemble method that builds decision trees sequentially, with each tree correcting the errors of its predecessors.

The predictive target: will the customer make a repeat purchase within 30 days?

Feature Engineering

The RFM scores serve as expert-engineered features—distilling complex transactional history into three high-signal numerical inputs. These are augmented with behavioral features:

  • Average page views and time on site per session
  • Total add-to-cart events (a strong intent signal)
  • Most viewed product category (one-hot encoded)

This hybrid approach—classical marketing framework plus modern ML—injects domain expertise directly into the model.

Model Performance

The trained XGBoost classifier achieved strong, balanced results on the holdout test set:

| Metric | Score | |--------|-------| | Accuracy | 92.15% | | Precision | 88.32% | | Recall | 85.11% | | F1-Score | 86.68% |

High precision means minimal wasted spend on unlikely converters. High recall means we capture the vast majority of customers who would have converted.

Predictive Segments: From Scores to Strategy

The model outputs a probability score (0.0–1.0) for each customer. Combined with historical RFM data, this creates four actionable segments:

  1. Prime for Upsell — Probability > 0.85 AND high monetary score. The brand's best customers, highly likely to buy again. Strategy: early access, premium product launches, exclusive loyalty rewards.

  2. Nurture to Loyalty — Probability 0.6–0.84. Engaged and interested, but need a gentle push. Strategy: personalized recommendations, free shipping incentives, sample-with-purchase offers.

  3. Win-Back Opportunity — Low recency score AND probability 0.4–0.59. Once-active customers the model still sees potential in. Strategy: multi-channel (email + SMS) win-back with 25% off, 48-hour urgency.

  4. Monitor & Maintain — Probability < 0.4. Low predicted engagement. Strategy: low-cost automated newsletter only. No active spend.

Activation: From Python to Klaviyo

A model sitting in a notebook generates zero revenue. The SegMark pipeline pushes predictions directly into Klaviyo via API:

  1. Profile Update: Each customer's repeat_purchase_score is written as a custom property via Klaviyo's Profile API.
  2. Dynamic Segments: Segments are defined by rules on this property—they update automatically as scores change.
  3. Personalized Flows: Each segment triggers tailored email and SMS campaigns with segment-specific offers, copy, and urgency.

This creates a living system—not a one-time analysis. As customer behavior evolves, the model rescores, segments shift, and marketing flows adapt automatically.

Results: Measured with Control Groups

To establish causal impact (not just correlation), 10% of each targetable segment was held out as a randomized control group receiving only generic communications. Results after six weeks:

| Metric | Baseline (Control) | Post-SegMark (Treatment) | Relative Uplift | |--------|-------------------|-------------------------|-----------------| | Repeat Purchase Rate | 17.3% | 28.5% | +64.7% | | Average Order Value | ₹3,200 | ₹3,550 | +10.9% | | Incremental Net Revenue | — | — | +₹22,00,000 |

The ₹22L figure is incremental revenue—additional revenue generated above what the control group baseline would have produced. This is the true, isolated impact of personalized segmentation.

Future Scope

The SegMark architecture—data pipeline → predictive model → marketing automation—is a reusable foundation. Planned extensions include:

  • CLV Prediction: Shifting from binary classification to regression to forecast 12-month customer lifetime value.
  • High-Value Churn Detection: Predicting churn risk specifically among top-tier customers, enabling high-touch retention interventions.
  • Product Recommendation Engine: Training on purchase sequences to predict the next most likely product per customer.
  • Expanded Feature Sets: Integrating zero-party data (skin concern quizzes, preference centers), customer service interactions, and social media engagement signals.

Final Thoughts

SegMark proves that data is not a byproduct of commerce—it is the central asset around which competitive strategy is built. The combination of RFM's intuitive behavioral framework with XGBoost's predictive power creates a segmentation engine that is both interpretable for marketers and performant for data scientists.

The result is a growth flywheel: every customer interaction generates new data, which refines predictions, which drives more effective marketing, which generates more engagement. This continuous loop is what separates brands that scale from brands that stall.

Disclaimer: This case study is based on Cosminder's SegMark Engine implementation for a D2C skincare brand. For technical details, model code, or partnership inquiries, contact support@cosminder.com.

Sources:

  • Internal research and implementation by Cosminder Solutions
  • A Cosminder Case Study

A data-driven approach to turning customer behavior into revenue

Share this post :