Data Management

Welcome back! In the previous chapters, we explored the Context object for individual visitors, the ExperienceManager for A/B tests, and the FeatureManager for feature flags. You might have noticed a pattern: when we asked the SDK to make a decision (like "which variation should user123 see?" or "is dark-mode enabled?"), the managers didn't just guess. They needed information!

But where does all this information live? How does the SDK know about your specific experiments, features, targeting rules, and which variation 'user123' was assigned to last time?

The Problem: Where Does Everything Live?

Imagine you're building a complex application. You have user settings, product information, application configuration, and maybe temporary data about the user's current session. Where do you keep all this data so different parts of your application can access it when needed? You probably use a database or some central configuration store.

The Convert SDK faces a similar challenge. It needs a central place to:

Store Project Setup: Keep track of all the experiences, features, audiences, goals, etc., defined in your Convert account for your specific project.
Fetch Updates: Get the latest version of this project setup from the Convert API or use data you provide directly.
Remember Visitor Decisions: If 'user123' is assigned to 'variation-B' of the headline test, the SDK needs to remember this so they see the same headline consistently.
Store Visitor Segments: Keep track of visitor properties or segments that might affect targeting.
Provide Data Access: Allow other managers (like ExperienceManager and FeatureManager) to easily retrieve the specific information they need.

How does the SDK manage this central pool of information?

What is `DataManager`? The Project's Librarian

Meet the DataManager! Think of it as the central librarian or database interface for your Convert SDK project. It's responsible for holding and managing all the essential information.

Its key responsibilities are:

The Bookshelf (Project Configuration): It holds the entire configuration downloaded from Convert (or provided by you), including details about all your experiences, features, audiences, goals, etc.
The Index Cards (Visitor State): It keeps track of visitor-specific information, most importantly:
- Which variation a visitor has been assigned to for each experience (bucketing decisions).
- Any relevant visitor segments or properties.
Fetching New Books (Data Retrieval): It works with the ApiManager to fetch the latest project configuration from Convert's servers.
Checking Books Out (Providing Data): It offers methods for other managers to look up specific pieces of information (e.g., "get me the details for the 'headline-test' experience" or "find the goal with ID '12345'").
Using a Filing System (Data Persistence): It can optionally use a DataStoreManager (which you can configure) to save and retrieve visitor bucketing information so that decisions persist across sessions or requests.

In short, DataManager is the single source of truth within the SDK for both the overall project setup and the state of individual visitors.

How it's Used (Mostly Behind the Scenes)

As a developer using the SDK, you will very rarely interact directly with the DataManager. It's primarily an internal component used by other managers.

Remember how running an experience on a Context delegated to the ExperienceManager's variation selection? And how that immediately delegated further? That final delegation step often lands here, at the DataManager.

Let's revisit the flow when you ask for an experience variation:

You call the Context's run experience method with 'headline-test'.
The Context calls the ExperienceManager's select variation method.
The ExperienceManager calls the DataManager's getBucketing('user123', 'headline-test', ...) to handle the entire process.

Similarly, when the FeatureManager needs to know if a feature is enabled, it asks the DataManager to perform the necessary bucketing checks across relevant experiences (getBucketing).

Even fetching basic information relies on the DataManager. For example, inside the ExperienceManager:

To get details of one experience, the ExperienceManager asks the DataManager to find the entity named by a given key in the 'experiences' list, using a method like getEntity(key, 'experiences').
To get a list of all experiences, it asks the DataManager for the entire list of 'experiences', using a method like getEntitiesList('experiences').

The ExperienceManager doesn't store the experience data itself; it asks the DataManager for it using methods like getEntity and getEntitiesList.

So, while you don't call DataManager methods directly, understanding its role is key to understanding how the SDK manages data and makes decisions.

Under the Hood: How `DataManager` Works

Let's peek inside the library.

1. Storing Project Configuration:

When the SDK initializes (Core), the ApiManager fetches the configuration data (a large JSON object) from Convert's servers (if using an sdkKey).
This data is passed to the DataManager and stored internally.
The DataManager provides helper methods to navigate this large structure efficiently:
- getEntitiesList('experiences'): Returns the array of all experience objects.
- getEntity('headline-test', 'experiences'): Finds and returns the specific experience with the key 'headline-test'.
- getEntityById('100567', 'goals'): Finds and returns the goal with the ID '100567'.

2. Storing Visitor State:

Decisions like "user123 gets variation-B" need to be remembered.
The DataManager uses an internal, in-memory cache to store these decisions for the current session or request. The structure looks like: { 'account-project-user123': { bucketing: { 'exp1_id': 'varB_id' }, segments: {...} } }
Persistence (Optional): To remember decisions across page loads, visits, or HTTP requests, the DataManager can use a DataStoreManager. This is a wrapper around a data store object that you might provide in the SDK configuration. The underlying store could be browser cookies, localStorage, sessionStorage, Redis, Memcached, a file-based cache, or even a custom backend storage system you implement.
- When DataManager stores a decision (using putData), it updates its internal cache and tells the DataStoreManager to save it persistently.
- When DataManager needs a decision (using getData), it checks its internal cache first, and if not found, asks the DataStoreManager to retrieve it from the persistent store.

graph LR
    DM[DataManager]
    cache["Internal Cache (In-Memory)"]
    dsm["DataStoreManager (Optional)"]
    Store[(Persistent Store <br/> e.g., Cookies, localStorage, Redis)]

    subgraph DataManager Scope
        DM -- Reads/Writes --> cache
        DM -- Reads/Writes --> dsm
    end

    subgraph External Persistence
        dsm -- Reads/Writes --> Store
    end

    style Store fill:#eee,stroke:#333,stroke-width:1px

3. Orchestrating Bucketing (getBucketing):

This is where the DataManager truly acts as the central coordinator. When the ExperienceManager's select variation method calls DataManager's getBucketing('user123', 'headline-test', attributes):

Check Cache/Store: DataManager first calls its getData('user123') method. This checks the internal in-memory cache. If not found there, it asks the DataStoreManager (if configured) to check the persistent store for a previously saved decision for 'headline-test' for 'user123'.
Return Cached Decision (If Found): If a valid, previous decision is found, DataManager retrieves the corresponding variation details and returns them immediately. Fast path!
Fetch Experience Details: If no decision is cached, DataManager retrieves the full definition of the 'headline-test' experience (variations, rules, traffic split) from its stored project configuration.
Check Targeting Rules: It uses the RuleManager to evaluate the experience's targeting rules (audiences, locations) against the visitor's properties (visitorProperties, locationProperties).
Rules Fail: If the visitor doesn't meet the targeting criteria, DataManager returns a RuleError.
Perform Bucketing: If the rules pass, DataManager calls the BucketingManager. It provides the visitor ID ('user123') and the traffic allocation defined for the variations of 'headline-test'. The BucketingManager calculates which variation ID the visitor falls into.
Store New Decision: DataManager receives the chosen variation ID from the BucketingManager. It then calls its putData('user123', { bucketing: { 'headline-test-id': 'chosen-variation-id' } }) method. This saves the decision to the internal cache and also tells the DataStoreManager (if present) to save it persistently.
Return New Decision: Finally, DataManager retrieves the full details of the chosen variation and returns the BucketedVariation object.

Here's a sequence diagram illustrating this flow:

sequenceDiagram
    participant EM as ExperienceManager
    participant DM as DataManager
    participant Cache as Internal Cache
    participant Store as DataStore (Optional)
    participant RM as RuleManager
    participant BM as BucketingManager

    EM->>+DM: getBucketing('user123', 'headline-test', ...)
    DM->>+Cache: Get decision for 'user123'/'headline-test'?
    Cache-->>-DM: Not found
    DM->>+Store: Get decision for 'user123'/'headline-test'?
    Store-->>-DM: Not found
    DM->>DM: Get 'headline-test' definition (rules, variations)
    DM->>+RM: Check rules match visitor properties?
    RM-->>-DM: Rules Pass
    DM->>+BM: Calculate variation for 'user123' (based on traffic split)
    BM-->>-DM: variationId = 'variation-B-id'
    DM->>+Cache: Store decision: 'user123'/'headline-test' -> 'variation-B-id'
    Cache-->>-DM: OK
    DM->>+Store: Store decision: 'user123'/'headline-test' -> 'variation-B-id'
    Store-->>-DM: OK
    DM->>DM: Get full details for 'variation-B-id'
    DM-->>-EM: Return BucketedVariation object for 'variation-B'

How It Works: DataManager Implementation

Let's look at simplified descriptions of the key internal operations.

1. Construction and Setup

When the DataManager is constructed, it:

Stores references to the managers it needs to collaborate with (RuleManager, BucketingManager, EventManager, ApiManager, LoggerManager).
Initializes the internal in-memory cache for bucketed visitors.
Stores the main project configuration data if provided initially.
Sets up the DataStoreManager if a data store is configured for persistence.

2. Accessing Project Data

The DataManager provides structured methods to access the project configuration:

A setter/getter for the entire project configuration data, with validation.
getEntitiesList(entityType) -- returns the full array for a given entity type (e.g., all experiences, all goals).
getEntity(key, entityType) -- finds a single entity by its key within a given entity type.
getEntityById(id, entityType) -- finds a single entity by its ID within a given entity type.

Internally, both getEntity and getEntityById use a shared helper that iterates through the list from getEntitiesList and matches on the specified field.

3. Managing Visitor Data

The DataManager manages visitor-specific decisions through two core methods:

putData(visitorId, newData) -- Merges new data with existing data for the visitor, updates the internal in-memory cache, optionally enforces a cache size limit (evicting the oldest entry when exceeded), and if a DataStoreManager is configured, persists the updated data to the external store.
getData(visitorId) -- Checks the in-memory cache first, then the persistent store (if configured), and merges the results. The in-memory cache takes priority.
getStoreKey(visitorId) -- Builds a unique storage key by combining the account ID, project ID, and visitor ID.

4. Orchestrating Bucketing

The core getBucketing method coordinates the entire process:

It delegates to an internal helper that accepts lookup by either key or ID.
The helper first checks targeting rules (which also checks for cached bucketing decisions internally) via matchRulesByField.
If a RuleError is returned, that error is propagated back to the caller.
If the rules pass, it delegates to an internal _retrieveBucketing method.
_retrieveBucketing checks whether a stored decision already exists. If so, it returns that. Otherwise, it builds variation buckets (mapping variation IDs to traffic allocations), asks the BucketingManager to pick a variation for the visitor, stores the new decision via putData, optionally enqueues a tracking event via the ApiManager, and returns the full BucketedVariation object.

Audience Evaluation Cadence

So far we have treated "the visitor's audience rules pass" as a single yes/no decision. In reality there are two independent axes that together determine whether a visitor is in an audience at any given moment:

Evaluation cadence — when the rules are re-checked. Controlled by the audience's type field.
Input source — where the data behind each rule comes from (live page state vs cached visitor state). Controlled per rule type.

Conflating the two is the single most common source of audience-behavior surprises. The rest of this section unpacks each axis and how the DataManager uses them.

The Three Audience Types (Authoring Model)

The backend audience schema (AudienceTypeEnum, and the Convert Management API docs) defines three audience types:

Type	URL rules allowed	When rules are evaluated	What happens once matched
`permanent`	No	Only at the first bucketing check for this experience	Visitor stays matched for the lifetime of the experience, even if underlying conditions change
`transient`	No	On every bucketing check (every pageview, every re-check)	Visitor can un-match on the next check — membership is recomputed each time
`segmentation`	Yes	On every check, until the first match	Visitor is tagged into the corresponding segment; segment membership then persists across sessions (subject to storage — see below)

segmentation is the only type that allows URL-based rules in its condition set, because the first URL match is what locks the visitor into the segment. The V2 API enforces this via two discriminated sub-schemas (AudienceWithUrlMatching for segmentation, AudienceWithoutUrlMatching for the other two).

How the Three Types Surface in the Serving Config

The SDK does not see all three types as audiences. The serving config's ConfigAudienceTypes enum declares only two values:

// @convertcom/js-sdk-types -> ConfigAudienceTypes
export type ConfigAudienceTypes = 'permanent' | 'transient';

That is because segmentation audiences are resolved server-side into a separate entity — ConfigSegment (see Data Model → "ConfigSegment"). A single authored segmentation audience becomes:

A ConfigSegment entry, whose rules are evaluated to decide segment membership.
An in_segment rule inside any audience or experience that previously referenced it, so the SDK still has a way to ask "is this visitor tagged into segment X?".

Segment membership persistence in Fullstack projects depends entirely on the DataStore — without one, segments are recomputed from scratch each request, same as any other visitor state.

The Cadence Filter in DataManager

When getBucketing invokes matchRulesByField, the DataManager filters the audience list before handing it to the RuleManager:

// @convertcom/js-sdk-data -> DataManager.matchRulesByField
audiencesToCheck = audiences.filter(
  (audience) => !(isBucketed && audience.type === ConfigAudienceTypes.PERMANENT)
);

The rule is simple:

Visitor already bucketed into this experience AND audience is permanent → skip that audience. Its first-match decision stands.
Everything else (transient audiences, or permanent audiences on a not-yet-bucketed visitor) → evaluate its rules again now.

Permanent audiences are therefore "decided once, frozen forever" from the SDK's point of view. Transient audiences are "decided every time." Nothing else in the engine distinguishes them.

The Input-Source Axis (Live vs Persisted)

Independently from cadence, each rule type reads its comparison value from one of two places:

Live — read fresh from the current page / request / DOM at the exact moment the rule is evaluated. Changes between pageviews.
Persisted — read from cached visitor state (cookies, in-memory visitor object, server-side store). Carries its own TTL and may span sessions.

The headline groupings:

Input source	Example rule types
Live	`url`, `url_with_query`, `query_string`, `query_param`, `fragment`, `hostname`, `protocol`, `element_visible`, `element_contains_text`, `cookie` (reads `document.cookie`), `screen_size`, time rules (`day_of_week`, `hour_of_day`, …), `js_condition`
Persisted	`source_name`, `medium`, `keyword`, `campaign` (REFERRAL cookie, 6-month TTL), UTM vars, `country`/`region`/`city`/`continent`/`zip_code` (cached on visitor object), `visitor_type`, `visits_count`, `pages_count`, `sessions_count`, `visitor_goals_count`, `in_experience`, `in_variation`, `in_segment`, `browser`, `os`, `browser_language`, `custom_variable`, `page_tag`

"Persisted" does not mean permanent — each persisted input has its own TTL and mutation rules. Geo can change if the cache expires or the visitor's IP changes; visitor_type flips from NEW to RETURNING at some point; bucketing membership can appear mid-session as other audiences match.

Why the Axes Matter: The Common Misconception Trap

A concrete support case that illustrates this clearly:

Customer sets up a Transient audience with a single rule: source_name contains "facebook". They expect the audience to drop the visitor the moment they navigate to a page without utm_source=facebook in the URL.

What the customer sees: the visitor stays matched even after navigating away. It looks like Transient was ignored, or like it's caching audience membership.

What is actually happening:

Transient cadence is working correctly — data-manager.ts does not skip this audience at the second pageview. The rule is re-evaluated.
But the rule's input is persisted. source_name reads visitor.source, which is backed by the _conv_r REFERRAL cookie. That cookie remembers "this visitor came from Facebook" for up to 6 months by design, for attribution purposes (analytics, cross-session campaigns, integrations).
So the rule evaluates against "facebook" on every pageview — and matches every time — regardless of the current URL.

This is not a bug; it is the two axes meeting at a point where the customer's mental model expected them to be the same. The correct primitive for "re-check the live URL on every pageview" is one of the URL-based rule types: query_string or query_param. Those are Live inputs and will drop the visitor immediately once utm_source=facebook disappears from the URL.

Why Transient + Persisted Is a Legitimate, Load-Bearing Combination

It is tempting to "fix" this by making source_name (and other persisted inputs) read live URL data when evaluated inside a Transient audience. That change would silently strip capability from a broad class of existing use cases, because Transient + Persisted is how customers tap into the update cadence of persisted values:

Bucketing transitions — in_experience / in_variation become true mid-session when another audience matches. Transient re-evaluation picks them up on the next pageview; Permanent would lock them out of this experience forever.
New → Returning — visitor_type flips at some point in time. Transient catches the transition; Permanent freezes the NEW label.
Custom variables / integration variables — set by external scripts (CRM sync, login state, loyalty tier, cart-value thresholds). Transient re-reads the latest value.
Geo drift — cache expiry, travel, VPN changes. Transient tracks the current value.
Attributed-visitor campaigns — source_name / campaign across sessions is exactly how customers target "Facebook-attributed visitors" for the full attribution window, not just the landing session. Making this read live URL would break those campaigns.

Design Guarantee (Non-Goal)

The separation of cadence from input source is intentional and stable. In particular:

The engine will not silently change the input source of a rule based on the audience type that contains it. A rule that reads persisted state inside a Permanent audience reads the same persisted state inside a Transient audience.
Any change here is a data-integrity change, not a UX change. Every currently-running Transient audience using source_name, medium, campaign, keyword, in_experience, visitor_type, geo rules, etc. would silently re-bucket its visitors, retroactively polluting in-flight experiment reports. This platform has served this behavior consistently for ~10 years; changes of this kind require product-level review, not patching.

Customer-education and authoring-UX problems (e.g. a transient audience combined with a persisted-input rule expecting live behavior) are resolved at the authoring layer — clearer tooltips in the audience builder, rule-picker grouping by input source, or suggesting the Live primitive (query_string/query_param) when a Transient audience is built around a Persisted traffic-source rule. None of those require changing the engine.

Conclusion

The DataManager is the unsung hero working tirelessly behind the scenes. It acts as the central library and filing system for the Convert SDK, holding all project configuration (experiences, features, etc.) and managing visitor-specific state like bucketing decisions and segments.

You've learned:

Why a central data management component is necessary.
That DataManager stores both project-wide configuration and visitor-specific state.
How it uses an internal cache and optionally a persistent data store (via DataStoreManager) to remember visitor bucketing decisions.
That it orchestrates the bucketing process by coordinating with the RuleManager and BucketingManager.
That you typically interact with it indirectly through other managers like ExperienceManager and FeatureManager.

We saw that when rules pass, the DataManager relies on the BucketingManager to perform the crucial step of actually assigning a visitor to a specific variation based on traffic percentages. How does that mathematical assignment work?

Let's dive into the hashing and allocation logic next: BucketingManager!