Skip to content

Machine Learning Platform Guide

Overview

The CoCore machine learning platform enables operations teams to ask plain-language questions about their data and receive predictive insights powered by gradient-boosted decision trees (XGBoost). You can configure ML questions through the UI or GraphQL API without writing code, allowing you to leverage sophisticated machine learning to optimize operations, predict outcomes, and improve decision-making.

This guide covers:

  • Core concepts and data model
  • Getting started with your first ML question
  • Feature engineering and data preparation
  • Training and evaluating models
  • Deploying models for predictions
  • API reference for GraphQL operations
  • Advanced workflows and best practices

Table of Contents

  1. Core Concepts
  2. Getting Started
  3. Features: Building Blocks of ML
  4. Question Templates
  5. Creating ML Questions
  6. Dataset Assembly
  7. Training Models
  8. Evaluating Model Performance
  9. Deploying Models
  10. Monitoring and Runs
  11. GraphQL API Reference
  12. Advanced Topics
  13. Troubleshooting

Core Concepts

The Platform Architecture

The ML platform consists of several interconnected components:

Question Templates → ML Questions → Features → Datasets → Training → Model Versions → Predictions
  • Question Templates: Pre-built configurations for common business questions (e.g., "What will tomorrow's throughput be?")
  • ML Questions: Your specific instance of a template with chosen features and filters
  • Features: Reusable data points derived from your operations (e.g., yesterday's production count, current staffing levels)
  • Datasets: Structured collections of features and target values ready for training
  • Model Versions: Trained models with evaluation metrics and approval status
  • Runs: Execution records tracking dataset assembly, training, and evaluation steps
  • Artifacts: Persisted files (datasets, model boosters, reports) generated during ML workflows

Multi-Tenant and Secure

All ML resources are multi-tenant by default. Your questions, features, and models are isolated to your account and respect the platform's IAM policies. This ensures data privacy and allows different teams to build models independently.

Plain-Language Approach

You interact with the ML platform using business terminology:

  • Questions instead of "prediction tasks"
  • Features instead of "input variables"
  • Templates for common use cases instead of custom configuration

Getting Started

Prerequisites

Before creating your first ML question, ensure:

  1. You have access to the platform (UI or API credentials)
  2. Your account has operational data (operations, products, resources, etc.)
  3. You understand the business outcome you want to predict

Quick Start: Your First ML Question

Here's a 5-step workflow to build and train your first model:

Step 1: Browse Question Templates

Explore available templates to find one matching your use case:

GraphQL Query:

graphql
query ListTemplates {
  listMlQuestionTemplates {
    id
    name
    label
    category
    questionCopy
    description
    recommendedFeatures
    requiredFeatures
    optionalFeatures
    status
  }
}

Response Example:

json
{
  "data": {
    "listMlQuestionTemplates": [
      {
        "id": "01JCXXX...",
        "name": "daily_throughput_forecast",
        "label": "Daily Throughput Forecast",
        "category": "operations",
        "questionCopy": "What will our throughput be tomorrow?",
        "description": "Predict next-day production throughput based on historical patterns and current resource availability.",
        "recommendedFeatures": ["yesterday_production", "staffing_level", "equipment_availability"],
        "requiredFeatures": ["target_throughput"],
        "optionalFeatures": ["weather_conditions", "day_of_week"],
        "status": "active"
      }
    ]
  }
}

Step 2: Review Available Features

Check which features are available for your template:

GraphQL Query:

graphql
query ListFeatures {
  listMlFeatures {
    id
    name
    label
    description
    category
    dataType
    units
    tags
    allowedTemplates
    recommendedTemplates
    active
    qualityState
  }
}

Response Example:

json
{
  "data": {
    "listMlFeatures": [
      {
        "id": "01JCYYY...",
        "name": "yesterday_production",
        "label": "Yesterday's Production Count",
        "description": "Total units produced in the previous 24-hour period",
        "category": "operations",
        "dataType": "integer",
        "units": "units",
        "tags": ["production", "throughput", "historical"],
        "allowedTemplates": ["daily_throughput_forecast"],
        "recommendedTemplates": ["daily_throughput_forecast"],
        "active": true,
        "qualityState": "good"
      }
    ]
  }
}

Feature Quality States:

  • good: Feature has recent, complete data
  • warning: Some data quality issues detected
  • error: Significant data problems (high missing rate, stale data)
  • unknown: Quality not yet assessed

Step 3: Create an ML Question

Create your question by specifying the template, features, and any filters:

GraphQL Mutation:

graphql
mutation CreateQuestion {
  createMlQuestion(
    input: {
      name: "my_daily_forecast"
      label: "My Daily Throughput Forecast"
      questionTemplate: "daily_throughput_forecast"
      description: "Predict tomorrow's production to optimize staffing"
      status: draft
      configuration: {
        features: ["yesterday_production", "staffing_level", "equipment_availability"]
        filters: {
          siteId: "01JCZZZ..."
          productFamily: "widgets"
        }
        target: "next_day_throughput"
        splitRatio: {
          train: 0.7
          validation: 0.15
          test: 0.15
        }
      }
      metadata: {
        owner: "operations_team"
        tags: ["production", "forecasting"]
      }
    }
  ) {
    id
    name
    label
    status
    configuration
    insertedAt
  }
}

Configuration Fields:

  • features: Array of feature names to include
  • filters: Criteria to filter source data (e.g., specific site, product type)
  • target: The variable you're trying to predict
  • splitRatio: How to divide data into training/validation/test sets (optional, defaults to 70/15/15)

Step 4: Build the Dataset

Assemble the dataset from your operational data:

GraphQL Mutation:

graphql
mutation BuildDataset {
  buildMlQuestionDataset(
    id: "01JDAAA..."  # Your question ID
    input: {
      forceRebuild: false
    }
  ) {
    id
    status
    lastServedAt
  }
}

What Happens:

  1. The platform loads your question's configuration
  2. Features are fetched and validated
  3. Operations matching your filters are retrieved
  4. Feature values are computed and transformed
  5. Data is split into train/validation/test sets
  6. Dataset is cached with a fingerprint for reuse
  7. An MlRun record tracks the assembly process

Parameters:

  • forceRebuild: Set to true to bypass cache and rebuild from scratch (useful when underlying data has changed)

Check Dataset Status:

graphql
query GetQuestionRuns {
  mlRunsByQuestion(questionId: "01JDAAA...") {
    id
    kind
    status
    datasetReport
    metrics
    startedAt
    completedAt
    errorDetails
  }
}

Dataset Report Structure:

json
{
  "recordCount": 1250,
  "fingerprint": "a3f7b9e2c1d4...",
  "featureStats": {
    "yesterday_production": {
      "mean": 450.2,
      "stdDev": 75.3,
      "min": 200,
      "max": 650,
      "missingRate": 0.02
    }
  },
  "targetStats": {
    "name": "next_day_throughput",
    "mean": 455.8,
    "stdDev": 72.1,
    "min": 210,
    "max": 670
  },
  "cached": false
}

Step 5: Train the Model

Once the dataset is ready, trigger training:

GraphQL Mutation:

graphql
mutation TrainModel {
  trainMlQuestion(
    id: "01JDAAA..."
    input: {
      forceDatasetRebuild: false
      hyperparameters: {
        maxDepth: 8
        eta: 0.1
        subsample: 0.8
        colsampleBytree: 0.8
        objective: "reg:squarederror"
      }
      trainOptions: {
        numBoostRounds: 200
        earlyStoppingRounds: 25
        verboseEval: false
      }
    }
  ) {
    id
    status
  }
}

Hyperparameters (Optional): If not specified, sensible defaults are used:

  • maxDepth: Maximum tree depth (default: 8)
  • eta: Learning rate (default: 0.1)
  • subsample: Fraction of samples used per tree (default: 0.8)
  • colsampleBytree: Fraction of features used per tree (default: 0.8)
  • objective: Loss function (default: reg:squarederror for regression)

Train Options (Optional):

  • numBoostRounds: Number of boosting iterations (default: 200)
  • earlyStoppingRounds: Stop if validation metric doesn't improve (default: 25)
  • verboseEval: Log training progress (default: false)

Training Process:

  1. Dataset is ensured (built or fetched from cache)
  2. A new MlModelVersion is created
  3. XGBoost trains on the training set
  4. Validation set is used for early stopping
  5. Test set provides final evaluation metrics
  6. Artifacts (booster model, config, metrics) are persisted
  7. Model version status updates to awaitingApproval

Step 6: Review Results

Check the model version and its performance:

GraphQL Query:

graphql
query GetModelVersions {
  mlModelVersionsByQuestion(questionId: "01JDAAA...") {
    id
    version
    status
    trainedAt
    approvedAt
    metricsSnapshot
    hyperparameters
    datasetFingerprint
  }
}

Metrics Snapshot Example:

json
{
  "train": {
    "rmse": 15.2,
    "mae": 11.3,
    "r2": 0.92
  },
  "validation": {
    "rmse": 18.4,
    "mae": 13.7,
    "r2": 0.88
  },
  "test": {
    "rmse": 19.1,
    "mae": 14.2,
    "r2": 0.87
  }
}

Metric Definitions:

  • RMSE (Root Mean Squared Error): Average prediction error magnitude
  • MAE (Mean Absolute Error): Average absolute prediction error
  • (R-squared): Proportion of variance explained (1.0 = perfect, 0.0 = no better than mean)

Features: Building Blocks of ML

What is a Feature?

A feature is a reusable data point that describes some aspect of your operations. Features transform raw operational data into model-ready inputs.

Feature Definition:

json
{
  "name": "yesterday_production",
  "label": "Yesterday's Production Count",
  "description": "Total units produced in the previous 24-hour period",
  "category": "operations",
  "sourceResource": "Operation",
  "sourceAction": "aggregate_production",
  "transformation": {
    "type": "sum",
    "timeWindow": "1_day",
    "offset": "-1_day"
  },
  "dataType": "integer",
  "units": "units",
  "tags": ["production", "throughput", "historical"],
  "allowedTemplates": ["daily_throughput_forecast"],
  "defaultFilters": {},
  "validationRules": {
    "min": 0,
    "max": 10000
  },
  "qualityChecks": {
    "maxMissingRate": 0.1,
    "freshnessHours": 48
  },
  "active": true
}

Feature Attributes

  • name: Unique identifier used in configurations (snake_case)
  • label: Human-friendly display name
  • description: What the feature represents and how it's computed
  • category: Grouping (e.g., "operations", "staffing", "quality")
  • sourceResource: Ash resource providing the data (e.g., "Operation", "Resource")
  • sourceAction: Action or calculation to fetch data
  • transformation: How raw data is transformed (aggregations, time windows, etc.)
  • dataType: integer, float, string, boolean, or datetime
  • units: Measurement units for clarity
  • tags: Searchable keywords
  • allowedTemplates: Which templates can use this feature (empty = all)
  • recommendedTemplates: Templates where this feature is suggested by default
  • defaultFilters: Automatic filters applied when sourcing data
  • validationRules: Business constraints (min/max values, etc.)
  • qualityChecks: Data quality expectations
  • active: Whether the feature can be selected in new questions

Creating Custom Features

GraphQL Mutation:

graphql
mutation CreateFeature {
  createMlFeature(
    input: {
      name: "average_cycle_time"
      label: "Average Cycle Time (Last 7 Days)"
      description: "Rolling 7-day average of operation cycle times"
      category: "operations"
      sourceResource: "Operation"
      sourceAction: "calculate_cycle_time"
      transformation: {
        type: "average"
        timeWindow: "7_days"
      }
      dataType: float
      units: "minutes"
      tags: ["efficiency", "operations", "cycle_time"]
      allowedTemplates: ["efficiency_prediction"]
      defaultFilters: {}
      validationRules: {
        min: 0
        max: 1440
      }
      qualityChecks: {
        maxMissingRate: 0.15
      }
      active: true
    }
  ) {
    id
    name
    label
    qualityState
  }
}

Auto-Generated Features

Features can be automatically generated from existing metrics in your system. The platform monitors metrics and creates corresponding ML features when appropriate.

Query Auto-Generated Features:

graphql
query AutoGeneratedFeatures {
  listMlFeatures(filter: { autoGenerated: { eq: true } }) {
    id
    name
    label
    sourceMetricId
    qualityState
  }
}

Feature Quality Monitoring

The platform continuously monitors feature quality based on:

  • Missing Rate: Percentage of null/missing values
  • Freshness: How recently data was updated
  • Distribution Shifts: Significant changes in value patterns

Features are assigned quality states:

  • good: Ready for production use
  • warning: Minor issues detected; review recommended
  • error: Significant problems; avoid using in new questions
  • unknown: Not yet evaluated

Check Feature Quality:

graphql
query FeatureQuality {
  getMlFeature(id: "01JCYYY...") {
    name
    qualityState
    qualityChecks
    metadata
  }
}

Question Templates

What are Question Templates?

Question Templates provide pre-configured starting points for common ML use cases. They bundle:

  • Recommended features
  • Default configuration
  • Eligibility rules (minimum sample size, required filters)
  • Interpretation guidance for operators

Template Structure

json
{
  "name": "daily_throughput_forecast",
  "label": "Daily Throughput Forecast",
  "category": "operations",
  "questionCopy": "What will our throughput be tomorrow?",
  "description": "Predict next-day production throughput based on historical patterns and current resource availability.",
  "recommendedFeatures": ["yesterday_production", "staffing_level", "equipment_availability"],
  "requiredFeatures": ["target_throughput"],
  "optionalFeatures": ["weather_conditions", "day_of_week", "planned_maintenance"],
  "defaultConfiguration": {
    "target": "next_day_throughput",
    "splitRatio": {
      "train": 0.7,
      "validation": 0.15,
      "test": 0.15
    },
    "horizon": "1_day"
  },
  "eligibilityRules": {
    "minimumSampleSize": 100,
    "requiredFilters": ["siteId"],
    "allowedFilters": ["siteId", "productFamily", "shiftType"]
  },
  "interpretationGuidance": "Predictions represent expected unit count. Consider confidence intervals and historical accuracy when planning staffing.",
  "status": "active"
}

Using Templates

Templates guide the question creation process:

  1. Browse templates to find your use case
  2. Review recommended features (auto-selected when creating questions)
  3. Add optional features for additional predictive power
  4. Provide required filters as defined in eligibilityRules
  5. Customize configuration if needed

Creating Custom Templates

Platform administrators can create templates for organization-specific use cases:

GraphQL Mutation:

graphql
mutation CreateTemplate {
  createMlQuestionTemplate(
    input: {
      name: "quality_defect_prediction"
      label: "Quality Defect Prediction"
      category: "quality"
      questionCopy: "What's the expected defect rate for this batch?"
      description: "Predict quality outcomes based on input materials, process parameters, and environmental conditions"
      recommendedFeatures: ["material_quality_score", "process_temperature", "operator_experience"]
      requiredFeatures: ["target_defect_rate"]
      optionalFeatures: ["humidity", "equipment_age", "shift_type"]
      defaultConfiguration: {
        target: "defect_rate"
        splitRatio: { train: 0.7, validation: 0.15, test: 0.15 }
      }
      eligibilityRules: {
        minimumSampleSize: 200
        requiredFilters: ["productType"]
      }
      interpretationGuidance: "Defect rate predictions are probabilistic. Use in conjunction with process controls and quality checks."
      status: active
    }
  ) {
    id
    name
    status
  }
}

Creating ML Questions

Question Lifecycle

Questions progress through these states:

  • draft: Initial configuration, dataset not yet built
  • collecting: Dataset assembly in progress
  • training: Model training in progress
  • serving: Has an approved model version ready for predictions
  • archived: No longer active

Question Configuration

The configuration field on an ML question contains:

json
{
  "features": ["yesterday_production", "staffing_level", "equipment_availability"],
  "filters": {
    "siteId": "01JCZZZ...",
    "productFamily": "widgets"
  },
  "target": "next_day_throughput",
  "splitRatio": {
    "train": 0.7,
    "validation": 0.15,
    "test": 0.15
  }
}

Configuration Fields:

  • features: Array of feature names (required)
  • filters: Criteria to filter operational data (template-specific)
  • target: What you're predicting (required)
  • splitRatio: Data split percentages (optional)

Validation Rules

When creating a question, the platform validates:

  1. Template exists and is active
  2. Required features are included
  3. All features are allowed by the template
  4. Required filters are provided
  5. Feature quality meets minimum thresholds
  6. Sufficient sample size is available

Validation errors prevent question creation and provide actionable feedback.

Updating Questions

Update question metadata or configuration:

GraphQL Mutation:

graphql
mutation UpdateQuestion {
  updateMlQuestion(
    id: "01JDAAA..."
    input: {
      label: "Updated Label"
      description: "Updated description"
      configuration: {
        features: ["yesterday_production", "staffing_level", "equipment_availability", "day_of_week"]
        filters: {
          siteId: "01JCZZZ..."
          productFamily: "widgets"
        }
        target: "next_day_throughput"
      }
      notes: "Added day_of_week feature for seasonality"
      status: draft
    }
  ) {
    id
    configuration
    updatedAt
  }
}

Note: Changing configuration (features, filters, target) invalidates existing datasets and model versions. You'll need to rebuild and retrain.

Dataset Assembly

What is a Dataset?

A dataset is a structured collection of:

  • Feature values (input matrix X)
  • Target values (output vector y)
  • Sample identifiers (traceable back to operations)
  • Train/validation/test splits
  • Statistics and metadata

Dataset Fingerprinting

Each unique configuration produces a deterministic fingerprint based on:

  • Question ID
  • Account ID
  • Feature names (ordered)
  • Filters
  • Target variable

Fingerprints enable:

  • Caching: Reuse datasets when configuration hasn't changed
  • Reproducibility: Same configuration always produces same dataset
  • Traceability: Link model versions back to exact training data

Dataset Assembly Process

Step-by-Step:

  1. Load question configuration (features, filters, target)
  2. Fetch feature definitions from catalog
  3. Compute fingerprint from configuration
  4. Check cache for existing dataset
  5. Load operations matching filters
  6. Assemble feature columns by computing values for each operation
  7. Extract target values from operations
  8. Build feature tensor (2D matrix: samples × features)
  9. Build target tensor (1D vector: samples)
  10. Split data deterministically into train/validation/test
  11. Generate statistics (means, standard deviations, missing rates)
  12. Cache dataset for future reuse
  13. Create artifact record pointing to cached data
  14. Record run with metadata and report

Inspecting Datasets

Query Dataset Runs:

graphql
query DatasetRuns {
  mlRunsByQuestion(questionId: "01JDAAA...") {
    id
    kind
    status
    datasetReport
    metrics
    context
    startedAt
    completedAt
  }
}

Dataset Report Fields:

  • recordCount: Total samples in dataset
  • fingerprint: Unique identifier for this dataset
  • featureStats: Per-feature statistics (mean, stdDev, min, max, missingRate)
  • targetStats: Target variable statistics
  • cached: Whether dataset was loaded from cache

Retrieve Dataset Artifacts:

graphql
query DatasetArtifacts {
  mlArtifactsByRun(runId: "01JDBBB...") {
    id
    artifactType
    name
    uri
    format
    byteSize
    metadata
    insertedAt
  }
}

Force Rebuild

When underlying operational data changes, force a dataset rebuild:

graphql
mutation RebuildDataset {
  buildMlQuestionDataset(
    id: "01JDAAA..."
    input: { forceRebuild: true }
  ) {
    id
    status
  }
}

Training Models

Model Training Workflow

  1. Ensure dataset exists (build if necessary)
  2. Create model version record
  3. Prepare data (convert tensors to Nx arrays)
  4. Train XGBoost with specified hyperparameters
  5. Evaluate on train/validation/test splits
  6. Persist artifacts (booster model, config, metrics)
  7. Update model version status to awaitingApproval
  8. Record training run with metrics and duration

Hyperparameter Tuning

Hyperparameters control model behavior. Defaults are sensible but can be customized:

Common Hyperparameters:

ParameterDescriptionDefaultRange
maxDepthMaximum tree depth (prevents overfitting)83-15
etaLearning rate (smaller = slower, more accurate)0.10.01-0.3
subsampleFraction of samples per tree0.80.5-1.0
colsampleBytreeFraction of features per tree0.80.5-1.0
objectiveLoss functionreg:squarederrorSee XGBoost docs

Training Options:

OptionDescriptionDefault
numBoostRoundsNumber of boosting iterations200
earlyStoppingRoundsStop if no improvement25
verboseEvalLog training progressfalse

Example: Custom Hyperparameters

graphql
mutation TrainWithCustomParams {
  trainMlQuestion(
    id: "01JDAAA..."
    input: {
      hyperparameters: {
        maxDepth: 6
        eta: 0.05
        subsample: 0.75
        colsampleBytree: 0.9
        gamma: 0.1
        minChildWeight: 5
        objective: "reg:squarederror"
        evalMetric: ["rmse", "mae"]
      }
      trainOptions: {
        numBoostRounds: 300
        earlyStoppingRounds: 30
      }
    }
  ) {
    id
    status
  }
}

Monitoring Training

Training progress is tracked via MlRun records:

Query Training Run:

graphql
query TrainingRun {
  mlRunsByModelVersion(modelVersionId: "01JDCCC...") {
    id
    kind
    status
    initiatedBy
    context
    metrics
    validationReport
    startedAt
    completedAt
    errorDetails
  }
}

Training Run Fields:

  • kind: Always training for model training runs
  • status: pending, inProgress, succeeded, failed, or canceled
  • metrics: Summary metrics (RMSE, MAE, R²) for each split
  • validationReport: Detailed evaluation with sample counts
  • errorDetails: If failed, diagnostic information

Training Artifacts

After successful training, artifacts are persisted:

Query Model Artifacts:

graphql
query ModelArtifacts {
  mlArtifactsByModelVersion(modelVersionId: "01JDCCC...") {
    id
    artifactType
    name
    uri
    format
    metadata
  }
}

Artifact Types:

  • booster: Trained XGBoost model (JSON format)
  • trainingConfig: Hyperparameters and training options
  • report: Evaluation metrics with sample counts
  • explainability: Feature importance and SHAP values (future)

Evaluating Model Performance

Evaluation Metrics

Models are evaluated on three data splits:

  1. Training Set (70%): Data used to train the model
  2. Validation Set (15%): Data used during training for early stopping
  3. Test Set (15%): Held-out data for final evaluation

Metrics Provided:

  • RMSE (Root Mean Squared Error): Penalizes large errors more heavily
  • MAE (Mean Absolute Error): Average absolute error magnitude
  • (R-squared): Variance explained (1.0 = perfect)

Interpreting Metrics

Good Model Characteristics:

  • Test RMSE close to validation RMSE (not overfitting)
  • R² > 0.7 on test set (explains most variance)
  • MAE acceptable for business use case

Warning Signs:

  • Train RMSE much lower than test RMSE (overfitting)
  • R² < 0.5 on test set (poor predictive power)
  • MAE too large for practical decisions

Example Metrics:

json
{
  "train": { "rmse": 15.2, "mae": 11.3, "r2": 0.92 },
  "validation": { "rmse": 18.4, "mae": 13.7, "r2": 0.88 },
  "test": { "rmse": 19.1, "mae": 14.2, "r2": 0.87 }
}

Interpretation: Model performs well with 87% variance explained. Slight degradation from train to test is normal and acceptable.

Comparing Model Versions

Questions can have multiple model versions (e.g., after retraining with new data or different hyperparameters):

Query All Versions:

graphql
query CompareVersions {
  mlModelVersionsByQuestion(questionId: "01JDAAA...") {
    id
    version
    status
    trainedAt
    metricsSnapshot
    hyperparameters
    datasetFingerprint
  }
}

Compare by:

  • Test set RMSE/MAE (lower is better)
  • R² (higher is better)
  • Training date (newer may use more recent data)
  • Dataset fingerprint (same data or new data?)

Detailed Evaluation Report

Query Validation Report:

graphql
query EvaluationDetails {
  getMlRun(id: "01JDBBB...") {
    validationReport
  }
}

Sample Report:

json
{
  "train": {
    "sampleCount": 875,
    "metrics": { "rmse": 15.2, "mae": 11.3, "r2": 0.92 }
  },
  "validation": {
    "sampleCount": 187,
    "metrics": { "rmse": 18.4, "mae": 13.7, "r2": 0.88 }
  },
  "test": {
    "sampleCount": 188,
    "metrics": { "rmse": 19.1, "mae": 14.2, "r2": 0.87 }
  }
}

Deploying Models

Model Version Lifecycle

Model versions progress through these states:

  • draft: Initial state before training
  • training: Model is currently being trained
  • evaluating: Metrics are being computed (brief)
  • awaitingApproval: Training complete, awaiting human review
  • approved: Approved for serving predictions
  • retired: No longer recommended for use

Approving a Model Version

After reviewing metrics, approve a version for production:

GraphQL Mutation:

graphql
mutation ApproveModel {
  updateMlModelVersion(
    id: "01JDCCC..."
    input: {
      status: approved
      approvedAt: "2025-10-23T14:30:00Z"
      approvedBy: "01JDUSER..."  # Your user ID
      notes: "Approved based on test R² of 0.87 and acceptable MAE"
    }
  ) {
    id
    version
    status
    approvedAt
    approvedBy
  }
}

Setting the Serving Model

Update the question to use a specific approved model:

GraphQL Mutation:

graphql
mutation SetServingModel {
  updateMlQuestion(
    id: "01JDAAA..."
    input: {
      servingModelVersionId: "01JDCCC..."
      status: serving
    }
  ) {
    id
    status
    servingModelVersionId
    servingModelVersion {
      id
      version
      status
      metricsSnapshot
    }
  }
}

Generating Predictions

Once a serving model is set, predictions can be generated (implementation depends on your serving infrastructure):

Conceptual Query:

graphql
mutation Predict {
  predictMlQuestion(
    id: "01JDAAA..."
    input: {
      features: {
        yesterdayProduction: 450
        staffingLevel: 12
        equipmentAvailability: 0.95
      }
    }
  ) {
    prediction
    confidence
    featureImportance
  }
}

Note: Prediction serving is typically handled by separate inference infrastructure that loads the booster artifact and serves requests at low latency.

Retiring Models

When a model is outdated or superseded:

GraphQL Mutation:

graphql
mutation RetireModel {
  updateMlModelVersion(
    id: "01JDCCC..."
    input: {
      status: retired
      notes: "Superseded by version 3 with improved accuracy"
    }
  ) {
    id
    status
  }
}

Monitoring and Runs

What are Runs?

MlRun records track every significant workflow step:

  • Dataset assembly (kind: dataset)
  • Model training (kind: training)
  • Model evaluation (kind: evaluation)
  • Prediction serving (kind: serving)

Runs provide:

  • Execution status and timestamps
  • Metrics and reports
  • Error details for debugging
  • Traceability (who initiated, why)

Querying Runs

By Question:

graphql
query QuestionRuns {
  mlRunsByQuestion(questionId: "01JDAAA...") {
    id
    kind
    status
    initiatedBy
    context
    datasetReport
    validationReport
    metrics
    startedAt
    completedAt
    errorDetails
  }
}

By Model Version:

graphql
query ModelRuns {
  mlRunsByModelVersion(modelVersionId: "01JDCCC...") {
    id
    kind
    status
    metrics
    startedAt
    completedAt
  }
}

Run Statuses

  • pending: Queued but not started
  • inProgress: Currently executing
  • succeeded: Completed successfully
  • failed: Encountered an error
  • canceled: Manually stopped

Error Handling

When runs fail, inspect errorDetails:

Example Error:

json
{
  "stage": "dataset_assembly",
  "reason": "insufficient_sample_size",
  "details": {
    "minimumRequired": 100,
    "actualCount": 47,
    "filters": { "siteId": "01JCZZZ...", "productFamily": "widgets" }
  }
}

Common Errors:

  • insufficient_sample_size: Not enough data matching filters
  • missing_features: Required features unavailable
  • feature_quality_issues: Data quality below threshold
  • training_failed: XGBoost training error (check hyperparameters)

Run Context

The context field provides execution metadata:

Dataset Run:

json
{
  "stage": "training",
  "triggeredBy": "user",
  "datasetRun": "01JDBBB..."
}

Training Run:

json
{
  "datasetRunId": "01JDBBB...",
  "modelVersion": 2
}

GraphQL API Reference

Queries

Questions

List all questions:

graphql
query {
  listMlQuestions {
    id
    name
    label
    questionTemplate
    status
    configuration
    servingModelVersionId
    insertedAt
    updatedAt
  }
}

Get single question:

graphql
query {
  getMlQuestion(id: "01JDAAA...") {
    id
    name
    label
    description
    questionTemplate
    status
    configuration
    metadata
    notes
    servingModelVersion {
      id
      version
      metricsSnapshot
    }
    modelVersions {
      id
      version
      status
      trainedAt
    }
    runs {
      id
      kind
      status
      completedAt
    }
  }
}

Questions by template:

graphql
query {
  mlQuestionsByTemplate(questionTemplate: "daily_throughput_forecast") {
    id
    name
    label
    status
  }
}

Question Templates

List templates:

graphql
query {
  listMlQuestionTemplates {
    id
    name
    label
    category
    questionCopy
    recommendedFeatures
    requiredFeatures
    optionalFeatures
    status
  }
}

Get template by name:

graphql
query {
  mlQuestionTemplateByName(name: "daily_throughput_forecast") {
    id
    name
    label
    description
    recommendedFeatures
    defaultConfiguration
    eligibilityRules
    interpretationGuidance
  }
}

Features

List features:

graphql
query {
  listMlFeatures {
    id
    name
    label
    description
    category
    dataType
    units
    tags
    allowedTemplates
    recommendedTemplates
    active
    qualityState
  }
}

Get feature by name:

graphql
query {
  mlFeatureByName(name: "yesterday_production") {
    id
    name
    label
    description
    sourceResource
    sourceAction
    transformation
    dataType
    validationRules
    qualityChecks
    qualityState
  }
}

Model Versions

List all versions:

graphql
query {
  listMlModelVersions {
    id
    questionId
    version
    status
    trainedAt
    approvedAt
    metricsSnapshot
  }
}

Versions by question:

graphql
query {
  mlModelVersionsByQuestion(questionId: "01JDAAA...") {
    id
    version
    status
    trainedAt
    metricsSnapshot
    hyperparameters
    datasetFingerprint
  }
}

Runs

List runs:

graphql
query {
  listMlRuns {
    id
    questionId
    modelVersionId
    kind
    status
    initiatedBy
    startedAt
    completedAt
  }
}

Runs by question:

graphql
query {
  mlRunsByQuestion(questionId: "01JDAAA...") {
    id
    kind
    status
    datasetReport
    validationReport
    metrics
    errorDetails
  }
}

Artifacts

List artifacts:

graphql
query {
  listMlArtifacts {
    id
    runId
    modelVersionId
    artifactType
    name
    uri
    format
  }
}

Artifacts by model version:

graphql
query {
  mlArtifactsByModelVersion(modelVersionId: "01JDCCC...") {
    id
    artifactType
    name
    uri
    format
    byteSize
    metadata
  }
}

Mutations

Questions

Create question:

graphql
mutation {
  createMlQuestion(input: {
    name: "my_question"
    label: "My Question"
    questionTemplate: "template_name"
    description: "Description"
    status: draft
    configuration: {
      features: ["feature1", "feature2"]
      filters: { key: "value" }
      target: "target_var"
    }
  }) {
    id
    name
    status
  }
}

Update question:

graphql
mutation {
  updateMlQuestion(id: "01JDAAA...", input: {
    label: "Updated Label"
    configuration: { /* new config */ }
    status: serving
    servingModelVersionId: "01JDCCC..."
  }) {
    id
    updatedAt
  }
}

Delete question:

graphql
mutation {
  deleteMlQuestion(id: "01JDAAA...") {
    id
  }
}

Build dataset:

graphql
mutation {
  buildMlQuestionDataset(id: "01JDAAA...", input: {
    forceRebuild: false
  }) {
    id
    status
  }
}

Train model:

graphql
mutation {
  trainMlQuestion(id: "01JDAAA...", input: {
    forceDatasetRebuild: false
    hyperparameters: {
      maxDepth: 8
      eta: 0.1
    }
    trainOptions: {
      numBoostRounds: 200
    }
  }) {
    id
    status
  }
}

Features

Create feature:

graphql
mutation {
  createMlFeature(input: {
    name: "feature_name"
    label: "Feature Label"
    description: "Description"
    category: "operations"
    sourceResource: "Operation"
    sourceAction: "action_name"
    transformation: {}
    dataType: float
    units: "units"
    tags: ["tag1", "tag2"]
    active: true
  }) {
    id
    name
  }
}

Update feature:

graphql
mutation {
  updateMlFeature(id: "01JCYYY...", input: {
    label: "Updated Label"
    description: "Updated description"
    active: false
  }) {
    id
    updatedAt
  }
}

Model Versions

Update version (approve):

graphql
mutation {
  updateMlModelVersion(id: "01JDCCC...", input: {
    status: approved
    approvedAt: "2025-10-23T14:30:00Z"
    approvedBy: "01JDUSER..."
    notes: "Approved for production"
  }) {
    id
    status
    approvedAt
  }
}

Advanced Topics

Multi-Step Workflows with QuestionBuilder

For advanced use cases, the platform includes a QuestionBuilder pipeline that validates and assembles questions from templates:

Conceptual Workflow:

  1. Select a template
  2. Choose features (required + optional)
  3. Provide filters
  4. QuestionBuilder validates:
    • Template exists and is active
    • All features are allowed by template
    • Required features are included
    • Feature quality meets thresholds
    • Filters satisfy eligibility rules
  5. Configuration is assembled and merged with template defaults
  6. Question is created with validated configuration

This workflow is typically handled by the UI but can be replicated via API calls.

Dataset Caching and Fingerprinting

How Caching Works:

  1. Configuration is serialized (question ID, features, filters, target)
  2. SHA-256 hash produces fingerprint
  3. Cache lookup by fingerprint
  4. If hit: return cached dataset
  5. If miss: assemble fresh dataset and cache

Benefits:

  • Faster iteration when experimenting with hyperparameters
  • Reproducibility (same config = same dataset)
  • Resource efficiency (avoid redundant computation)

Cache Invalidation:

  • Manually force rebuild with forceRebuild: true
  • Cache entries may be purged based on age or size limits

Feature Transformations

Features support various transformation types:

  • Aggregations: sum, average, count, min, max
  • Time Windows: 1_day, 7_days, 30_days, 90_days
  • Offsets: -1_day, -7_days (for lagged features)
  • Custom: Platform-specific transformations

Example:

json
{
  "transformation": {
    "type": "average",
    "timeWindow": "7_days",
    "offset": "-1_day",
    "aggregateBy": "siteId"
  }
}

Train/Validation/Test Splits

Data is split deterministically using a seed derived from the dataset fingerprint:

Default Ratios:

  • Train: 70%
  • Validation: 15%
  • Test: 15%

Custom Ratios:

json
{
  "configuration": {
    "splitRatio": {
      "train": 0.8,
      "validation": 0.1,
      "test": 0.1
    }
  }
}

Splitting Algorithm:

  • Samples are shuffled using fingerprint-based seed
  • Deterministic (same fingerprint = same split)
  • Stratified by sample ID (reproducible)

Handling Missing Data

The platform handles missing feature values:

Strategies:

  • Numeric features: Impute with mean or median
  • Categorical features: Encode as special "missing" category
  • High missing rates: Flag in quality checks

Best Practices:

  • Monitor feature missingRate in dataset reports
  • Set maxMissingRate in feature quality checks
  • Investigate and address root causes of missingness

Multi-Tenant Isolation

All ML resources are tenant-scoped:

  • Questions, features, and models are isolated per account
  • Dataset assembly only accesses tenant's operational data
  • IAM policies enforce permissions

GraphQL automatically scopes queries/mutations to the authenticated tenant. No manual tenant filtering required.

Troubleshooting

Common Issues

Issue: "Insufficient sample size"

Symptom: Dataset assembly fails with insufficient_sample_size error.

Causes:

  • Filters too restrictive (e.g., filtering to a site with little data)
  • Template requires minimum sample size not met

Solutions:

  • Broaden filters (remove or relax constraints)
  • Accumulate more operational data before training
  • Check template's eligibilityRules.minimumSampleSize

Issue: "Feature quality issues"

Symptom: Dataset assembly fails with feature_quality_issues error.

Causes:

  • Feature has high missing rate exceeding maxMissingRate
  • Feature data is stale or unavailable

Solutions:

  • Check feature quality state: getMlFeature(id: "...") { qualityState, qualityChecks }
  • Review operational data pipeline for the feature's source
  • Exclude problematic feature and use alternatives
  • Adjust maxMissingRate in feature definition if acceptable

Issue: Model overfitting

Symptom: Train RMSE much lower than test RMSE; poor generalization.

Causes:

  • Model too complex (maxDepth too high)
  • Too many boosting rounds
  • Small dataset with many features

Solutions:

  • Reduce maxDepth (try 4-6)
  • Reduce numBoostRounds or increase earlyStoppingRounds
  • Reduce feature count (focus on most important)
  • Increase regularization (gamma, lambda, alpha)

Issue: Model underfitting

Symptom: Low R² on all splits; predictions not much better than mean.

Causes:

  • Model too simple
  • Features lack predictive power
  • Insufficient boosting rounds

Solutions:

  • Increase maxDepth (try 10-12)
  • Increase numBoostRounds
  • Add more relevant features
  • Check target variable variance (is it predictable?)

Issue: "Training failed"

Symptom: Training run status is failed with XGBoost error.

Causes:

  • Invalid hyperparameters
  • Data format issues (NaN, Inf)
  • Insufficient resources (memory, time)

Solutions:

  • Review errorDetails in run record
  • Validate hyperparameters against XGBoost documentation
  • Check dataset report for data anomalies
  • Contact platform support if persistent

Debugging Workflows

Step 1: Verify Question Configuration

graphql
query {
  getMlQuestion(id: "01JDAAA...") {
    configuration
    status
  }
}

Step 2: Check Dataset Run

graphql
query {
  mlRunsByQuestion(questionId: "01JDAAA...") {
    id
    kind
    status
    datasetReport
    errorDetails
  }
}

Step 3: Review Training Run

graphql
query {
  mlRunsByModelVersion(modelVersionId: "01JDCCC...") {
    id
    kind
    status
    metrics
    validationReport
    errorDetails
  }
}

Step 4: Inspect Artifacts

graphql
query {
  mlArtifactsByRun(runId: "01JDBBB...") {
    artifactType
    name
    uri
    metadata
  }
}

Performance Optimization

Dataset Assembly:

  • Use filters to reduce data volume
  • Ensure feature source actions are optimized
  • Leverage caching (avoid forceRebuild unless necessary)

Model Training:

  • Start with default hyperparameters
  • Use early stopping to prevent overtraining
  • Monitor training duration in run records

Predictions:

  • Load booster artifacts once and reuse
  • Batch predictions when possible
  • Cache predictions for static inputs

Best Practices

1. Start Simple

  • Begin with recommended features from templates
  • Use default hyperparameters for initial training
  • Iterate based on evaluation metrics

2. Monitor Feature Quality

  • Regularly review feature qualityState
  • Investigate and address quality warnings
  • Deactivate features with persistent issues

3. Version Control Configurations

  • Use descriptive question names and labels
  • Document configuration changes in notes field
  • Track model versions with approval notes

4. Evaluate Rigorously

  • Always review test set metrics (not just training)
  • Compare new versions against existing serving models
  • Consider business impact (e.g., cost of prediction errors)

5. Retrain Regularly

  • Retrain models when new operational data accumulates
  • Monitor prediction accuracy over time (concept drift)
  • Automate retraining schedules for production questions

6. Collaborate

  • Share question templates across teams
  • Document interpretation guidance for operators
  • Review model approvals with domain experts

7. Secure Artifacts

  • Artifacts contain sensitive operational insights
  • Ensure IAM policies restrict access appropriately
  • Audit model deployments and usage

Additional Resources

XGBoost Documentation

GraphQL Resources

Platform Support

  • Contact your platform administrator for assistance
  • Review platform changelog for new features
  • Join user community forums (coming soon)

Appendix: Field Reference

MlQuestion Fields

FieldTypeDescription
idUUIDUnique identifier
accountIdUUIDTenant identifier (auto-set)
nameStringStable identifier (slug)
labelStringDisplay name
questionTemplateStringTemplate name reference
descriptionStringLong description
statusEnumdraft, collecting, training, serving, archived
configurationMapFeatures, filters, target, splitRatio
metadataMapTags, owner, custom fields
notesStringOperational notes
servingModelVersionIdUUIDCurrently approved model
lastServedAtDateTimeLast prediction timestamp
insertedAtDateTimeCreation timestamp
updatedAtDateTimeLast update timestamp

MlModelVersion Fields

FieldTypeDescription
idUUIDUnique identifier
questionIdUUIDOwning question
versionIntegerMonotonic version number
statusEnumdraft, training, evaluating, awaitingApproval, approved, retired
trainedAtDateTimeTraining completion timestamp
approvedAtDateTimeApproval timestamp
approvedByUUIDApprover user ID
notesStringApproval/retirement notes
metricsSnapshotMapTrain/validation/test metrics
hyperparametersMapXGBoost hyperparameters used
datasetFingerprintStringDataset identifier

MlRun Fields

FieldTypeDescription
idUUIDUnique identifier
questionIdUUIDAssociated question
modelVersionIdUUIDAssociated model version (if applicable)
kindEnumdataset, training, evaluation, serving
statusEnumpending, inProgress, succeeded, failed, canceled
initiatedByStringUser or system identifier
contextMapExecution context metadata
metricsMapPerformance metrics
datasetReportMapDataset statistics (for dataset runs)
validationReportMapEvaluation details (for training runs)
errorDetailsMapError diagnostics (if failed)
startedAtDateTimeExecution start timestamp
completedAtDateTimeExecution completion timestamp

Feature Fields

FieldTypeDescription
idUUIDUnique identifier
nameStringStable identifier (snake_case)
labelStringDisplay name
descriptionStringWhat the feature represents
categoryStringGrouping (operations, staffing, etc.)
sourceResourceStringAsh resource providing data
sourceActionStringAction/calculation name
transformationMapTransformation metadata
dataTypeEnuminteger, float, string, boolean, datetime
unitsStringMeasurement units
tagsArraySearchable keywords
allowedTemplatesArrayCompatible template names
recommendedTemplatesArrayTemplates where recommended
defaultFiltersMapAuto-applied filters
validationRulesMapBusiness constraints
qualityChecksMapData quality expectations
activeBooleanAvailable for selection
qualityStateEnumgood, warning, error, unknown
autoGeneratedBooleanCreated from metric
sourceMetricIdUUIDSource metric (if auto-generated)

Glossary

  • Artifact: Persisted file (dataset, model booster, report) generated during ML workflows
  • Booster: Trained XGBoost model
  • Configuration: Question-specific settings (features, filters, target)
  • Dataset: Structured collection of features and target values for training
  • Feature: Reusable data point derived from operational data
  • Fingerprint: Hash uniquely identifying a dataset configuration
  • Hyperparameter: Tunable parameter controlling model training (e.g., maxDepth, eta)
  • Model Version: Specific trained instance of a question with metrics and approval state
  • Question: Plain-language definition of a prediction task
  • Run: Execution record tracking dataset assembly, training, or evaluation
  • Split: Subset of dataset (train, validation, test)
  • Template: Pre-configured starting point for common ML use cases
  • Tenant: Account/organization in the multi-tenant system
  • XGBoost: Gradient boosting library used for model training

Conclusion

You now have a comprehensive understanding of the CoCore machine learning platform. Start by exploring question templates, creating your first ML question, and iterating based on evaluation metrics. The platform handles the complexity of feature engineering, dataset assembly, and model training, allowing you to focus on business outcomes.

For additional support, consult your platform administrator or refer to the GraphQL API documentation embedded in the GraphQL schema explorer.

Happy predicting!

Connect. Combine. Collaborate.
The pioneering open integration platform, dedicated to transforming connectivity in the printing industry.