Machine Learning Platform Guide
Overview
The CoCore machine learning platform enables operations teams to ask plain-language questions about their data and receive predictive insights powered by gradient-boosted decision trees (XGBoost). You can configure ML questions through the UI or GraphQL API without writing code, allowing you to leverage sophisticated machine learning to optimize operations, predict outcomes, and improve decision-making.
This guide covers:
- Core concepts and data model
- Getting started with your first ML question
- Feature engineering and data preparation
- Training and evaluating models
- Deploying models for predictions
- API reference for GraphQL operations
- Advanced workflows and best practices
Table of Contents
- Core Concepts
- Getting Started
- Features: Building Blocks of ML
- Question Templates
- Creating ML Questions
- Dataset Assembly
- Training Models
- Evaluating Model Performance
- Deploying Models
- Monitoring and Runs
- GraphQL API Reference
- Advanced Topics
- Troubleshooting
Core Concepts
The Platform Architecture
The ML platform consists of several interconnected components:
Question Templates → ML Questions → Features → Datasets → Training → Model Versions → Predictions- Question Templates: Pre-built configurations for common business questions (e.g., "What will tomorrow's throughput be?")
- ML Questions: Your specific instance of a template with chosen features and filters
- Features: Reusable data points derived from your operations (e.g., yesterday's production count, current staffing levels)
- Datasets: Structured collections of features and target values ready for training
- Model Versions: Trained models with evaluation metrics and approval status
- Runs: Execution records tracking dataset assembly, training, and evaluation steps
- Artifacts: Persisted files (datasets, model boosters, reports) generated during ML workflows
Multi-Tenant and Secure
All ML resources are multi-tenant by default. Your questions, features, and models are isolated to your account and respect the platform's IAM policies. This ensures data privacy and allows different teams to build models independently.
Plain-Language Approach
You interact with the ML platform using business terminology:
- Questions instead of "prediction tasks"
- Features instead of "input variables"
- Templates for common use cases instead of custom configuration
Getting Started
Prerequisites
Before creating your first ML question, ensure:
- You have access to the platform (UI or API credentials)
- Your account has operational data (operations, products, resources, etc.)
- You understand the business outcome you want to predict
Quick Start: Your First ML Question
Here's a 5-step workflow to build and train your first model:
Step 1: Browse Question Templates
Explore available templates to find one matching your use case:
GraphQL Query:
query ListTemplates {
listMlQuestionTemplates {
id
name
label
category
questionCopy
description
recommendedFeatures
requiredFeatures
optionalFeatures
status
}
}Response Example:
{
"data": {
"listMlQuestionTemplates": [
{
"id": "01JCXXX...",
"name": "daily_throughput_forecast",
"label": "Daily Throughput Forecast",
"category": "operations",
"questionCopy": "What will our throughput be tomorrow?",
"description": "Predict next-day production throughput based on historical patterns and current resource availability.",
"recommendedFeatures": ["yesterday_production", "staffing_level", "equipment_availability"],
"requiredFeatures": ["target_throughput"],
"optionalFeatures": ["weather_conditions", "day_of_week"],
"status": "active"
}
]
}
}Step 2: Review Available Features
Check which features are available for your template:
GraphQL Query:
query ListFeatures {
listMlFeatures {
id
name
label
description
category
dataType
units
tags
allowedTemplates
recommendedTemplates
active
qualityState
}
}Response Example:
{
"data": {
"listMlFeatures": [
{
"id": "01JCYYY...",
"name": "yesterday_production",
"label": "Yesterday's Production Count",
"description": "Total units produced in the previous 24-hour period",
"category": "operations",
"dataType": "integer",
"units": "units",
"tags": ["production", "throughput", "historical"],
"allowedTemplates": ["daily_throughput_forecast"],
"recommendedTemplates": ["daily_throughput_forecast"],
"active": true,
"qualityState": "good"
}
]
}
}Feature Quality States:
good: Feature has recent, complete datawarning: Some data quality issues detectederror: Significant data problems (high missing rate, stale data)unknown: Quality not yet assessed
Step 3: Create an ML Question
Create your question by specifying the template, features, and any filters:
GraphQL Mutation:
mutation CreateQuestion {
createMlQuestion(
input: {
name: "my_daily_forecast"
label: "My Daily Throughput Forecast"
questionTemplate: "daily_throughput_forecast"
description: "Predict tomorrow's production to optimize staffing"
status: draft
configuration: {
features: ["yesterday_production", "staffing_level", "equipment_availability"]
filters: {
siteId: "01JCZZZ..."
productFamily: "widgets"
}
target: "next_day_throughput"
splitRatio: {
train: 0.7
validation: 0.15
test: 0.15
}
}
metadata: {
owner: "operations_team"
tags: ["production", "forecasting"]
}
}
) {
id
name
label
status
configuration
insertedAt
}
}Configuration Fields:
features: Array of feature names to includefilters: Criteria to filter source data (e.g., specific site, product type)target: The variable you're trying to predictsplitRatio: How to divide data into training/validation/test sets (optional, defaults to 70/15/15)
Step 4: Build the Dataset
Assemble the dataset from your operational data:
GraphQL Mutation:
mutation BuildDataset {
buildMlQuestionDataset(
id: "01JDAAA..." # Your question ID
input: {
forceRebuild: false
}
) {
id
status
lastServedAt
}
}What Happens:
- The platform loads your question's configuration
- Features are fetched and validated
- Operations matching your filters are retrieved
- Feature values are computed and transformed
- Data is split into train/validation/test sets
- Dataset is cached with a fingerprint for reuse
- An
MlRunrecord tracks the assembly process
Parameters:
forceRebuild: Set totrueto bypass cache and rebuild from scratch (useful when underlying data has changed)
Check Dataset Status:
query GetQuestionRuns {
mlRunsByQuestion(questionId: "01JDAAA...") {
id
kind
status
datasetReport
metrics
startedAt
completedAt
errorDetails
}
}Dataset Report Structure:
{
"recordCount": 1250,
"fingerprint": "a3f7b9e2c1d4...",
"featureStats": {
"yesterday_production": {
"mean": 450.2,
"stdDev": 75.3,
"min": 200,
"max": 650,
"missingRate": 0.02
}
},
"targetStats": {
"name": "next_day_throughput",
"mean": 455.8,
"stdDev": 72.1,
"min": 210,
"max": 670
},
"cached": false
}Step 5: Train the Model
Once the dataset is ready, trigger training:
GraphQL Mutation:
mutation TrainModel {
trainMlQuestion(
id: "01JDAAA..."
input: {
forceDatasetRebuild: false
hyperparameters: {
maxDepth: 8
eta: 0.1
subsample: 0.8
colsampleBytree: 0.8
objective: "reg:squarederror"
}
trainOptions: {
numBoostRounds: 200
earlyStoppingRounds: 25
verboseEval: false
}
}
) {
id
status
}
}Hyperparameters (Optional): If not specified, sensible defaults are used:
maxDepth: Maximum tree depth (default: 8)eta: Learning rate (default: 0.1)subsample: Fraction of samples used per tree (default: 0.8)colsampleBytree: Fraction of features used per tree (default: 0.8)objective: Loss function (default:reg:squarederrorfor regression)
Train Options (Optional):
numBoostRounds: Number of boosting iterations (default: 200)earlyStoppingRounds: Stop if validation metric doesn't improve (default: 25)verboseEval: Log training progress (default: false)
Training Process:
- Dataset is ensured (built or fetched from cache)
- A new
MlModelVersionis created - XGBoost trains on the training set
- Validation set is used for early stopping
- Test set provides final evaluation metrics
- Artifacts (booster model, config, metrics) are persisted
- Model version status updates to
awaitingApproval
Step 6: Review Results
Check the model version and its performance:
GraphQL Query:
query GetModelVersions {
mlModelVersionsByQuestion(questionId: "01JDAAA...") {
id
version
status
trainedAt
approvedAt
metricsSnapshot
hyperparameters
datasetFingerprint
}
}Metrics Snapshot Example:
{
"train": {
"rmse": 15.2,
"mae": 11.3,
"r2": 0.92
},
"validation": {
"rmse": 18.4,
"mae": 13.7,
"r2": 0.88
},
"test": {
"rmse": 19.1,
"mae": 14.2,
"r2": 0.87
}
}Metric Definitions:
- RMSE (Root Mean Squared Error): Average prediction error magnitude
- MAE (Mean Absolute Error): Average absolute prediction error
- R² (R-squared): Proportion of variance explained (1.0 = perfect, 0.0 = no better than mean)
Features: Building Blocks of ML
What is a Feature?
A feature is a reusable data point that describes some aspect of your operations. Features transform raw operational data into model-ready inputs.
Feature Definition:
{
"name": "yesterday_production",
"label": "Yesterday's Production Count",
"description": "Total units produced in the previous 24-hour period",
"category": "operations",
"sourceResource": "Operation",
"sourceAction": "aggregate_production",
"transformation": {
"type": "sum",
"timeWindow": "1_day",
"offset": "-1_day"
},
"dataType": "integer",
"units": "units",
"tags": ["production", "throughput", "historical"],
"allowedTemplates": ["daily_throughput_forecast"],
"defaultFilters": {},
"validationRules": {
"min": 0,
"max": 10000
},
"qualityChecks": {
"maxMissingRate": 0.1,
"freshnessHours": 48
},
"active": true
}Feature Attributes
- name: Unique identifier used in configurations (snake_case)
- label: Human-friendly display name
- description: What the feature represents and how it's computed
- category: Grouping (e.g., "operations", "staffing", "quality")
- sourceResource: Ash resource providing the data (e.g., "Operation", "Resource")
- sourceAction: Action or calculation to fetch data
- transformation: How raw data is transformed (aggregations, time windows, etc.)
- dataType:
integer,float,string,boolean, ordatetime - units: Measurement units for clarity
- tags: Searchable keywords
- allowedTemplates: Which templates can use this feature (empty = all)
- recommendedTemplates: Templates where this feature is suggested by default
- defaultFilters: Automatic filters applied when sourcing data
- validationRules: Business constraints (min/max values, etc.)
- qualityChecks: Data quality expectations
- active: Whether the feature can be selected in new questions
Creating Custom Features
GraphQL Mutation:
mutation CreateFeature {
createMlFeature(
input: {
name: "average_cycle_time"
label: "Average Cycle Time (Last 7 Days)"
description: "Rolling 7-day average of operation cycle times"
category: "operations"
sourceResource: "Operation"
sourceAction: "calculate_cycle_time"
transformation: {
type: "average"
timeWindow: "7_days"
}
dataType: float
units: "minutes"
tags: ["efficiency", "operations", "cycle_time"]
allowedTemplates: ["efficiency_prediction"]
defaultFilters: {}
validationRules: {
min: 0
max: 1440
}
qualityChecks: {
maxMissingRate: 0.15
}
active: true
}
) {
id
name
label
qualityState
}
}Auto-Generated Features
Features can be automatically generated from existing metrics in your system. The platform monitors metrics and creates corresponding ML features when appropriate.
Query Auto-Generated Features:
query AutoGeneratedFeatures {
listMlFeatures(filter: { autoGenerated: { eq: true } }) {
id
name
label
sourceMetricId
qualityState
}
}Feature Quality Monitoring
The platform continuously monitors feature quality based on:
- Missing Rate: Percentage of null/missing values
- Freshness: How recently data was updated
- Distribution Shifts: Significant changes in value patterns
Features are assigned quality states:
good: Ready for production usewarning: Minor issues detected; review recommendederror: Significant problems; avoid using in new questionsunknown: Not yet evaluated
Check Feature Quality:
query FeatureQuality {
getMlFeature(id: "01JCYYY...") {
name
qualityState
qualityChecks
metadata
}
}Question Templates
What are Question Templates?
Question Templates provide pre-configured starting points for common ML use cases. They bundle:
- Recommended features
- Default configuration
- Eligibility rules (minimum sample size, required filters)
- Interpretation guidance for operators
Template Structure
{
"name": "daily_throughput_forecast",
"label": "Daily Throughput Forecast",
"category": "operations",
"questionCopy": "What will our throughput be tomorrow?",
"description": "Predict next-day production throughput based on historical patterns and current resource availability.",
"recommendedFeatures": ["yesterday_production", "staffing_level", "equipment_availability"],
"requiredFeatures": ["target_throughput"],
"optionalFeatures": ["weather_conditions", "day_of_week", "planned_maintenance"],
"defaultConfiguration": {
"target": "next_day_throughput",
"splitRatio": {
"train": 0.7,
"validation": 0.15,
"test": 0.15
},
"horizon": "1_day"
},
"eligibilityRules": {
"minimumSampleSize": 100,
"requiredFilters": ["siteId"],
"allowedFilters": ["siteId", "productFamily", "shiftType"]
},
"interpretationGuidance": "Predictions represent expected unit count. Consider confidence intervals and historical accuracy when planning staffing.",
"status": "active"
}Using Templates
Templates guide the question creation process:
- Browse templates to find your use case
- Review recommended features (auto-selected when creating questions)
- Add optional features for additional predictive power
- Provide required filters as defined in
eligibilityRules - Customize configuration if needed
Creating Custom Templates
Platform administrators can create templates for organization-specific use cases:
GraphQL Mutation:
mutation CreateTemplate {
createMlQuestionTemplate(
input: {
name: "quality_defect_prediction"
label: "Quality Defect Prediction"
category: "quality"
questionCopy: "What's the expected defect rate for this batch?"
description: "Predict quality outcomes based on input materials, process parameters, and environmental conditions"
recommendedFeatures: ["material_quality_score", "process_temperature", "operator_experience"]
requiredFeatures: ["target_defect_rate"]
optionalFeatures: ["humidity", "equipment_age", "shift_type"]
defaultConfiguration: {
target: "defect_rate"
splitRatio: { train: 0.7, validation: 0.15, test: 0.15 }
}
eligibilityRules: {
minimumSampleSize: 200
requiredFilters: ["productType"]
}
interpretationGuidance: "Defect rate predictions are probabilistic. Use in conjunction with process controls and quality checks."
status: active
}
) {
id
name
status
}
}Creating ML Questions
Question Lifecycle
Questions progress through these states:
- draft: Initial configuration, dataset not yet built
- collecting: Dataset assembly in progress
- training: Model training in progress
- serving: Has an approved model version ready for predictions
- archived: No longer active
Question Configuration
The configuration field on an ML question contains:
{
"features": ["yesterday_production", "staffing_level", "equipment_availability"],
"filters": {
"siteId": "01JCZZZ...",
"productFamily": "widgets"
},
"target": "next_day_throughput",
"splitRatio": {
"train": 0.7,
"validation": 0.15,
"test": 0.15
}
}Configuration Fields:
features: Array of feature names (required)filters: Criteria to filter operational data (template-specific)target: What you're predicting (required)splitRatio: Data split percentages (optional)
Validation Rules
When creating a question, the platform validates:
- Template exists and is active
- Required features are included
- All features are allowed by the template
- Required filters are provided
- Feature quality meets minimum thresholds
- Sufficient sample size is available
Validation errors prevent question creation and provide actionable feedback.
Updating Questions
Update question metadata or configuration:
GraphQL Mutation:
mutation UpdateQuestion {
updateMlQuestion(
id: "01JDAAA..."
input: {
label: "Updated Label"
description: "Updated description"
configuration: {
features: ["yesterday_production", "staffing_level", "equipment_availability", "day_of_week"]
filters: {
siteId: "01JCZZZ..."
productFamily: "widgets"
}
target: "next_day_throughput"
}
notes: "Added day_of_week feature for seasonality"
status: draft
}
) {
id
configuration
updatedAt
}
}Note: Changing configuration (features, filters, target) invalidates existing datasets and model versions. You'll need to rebuild and retrain.
Dataset Assembly
What is a Dataset?
A dataset is a structured collection of:
- Feature values (input matrix X)
- Target values (output vector y)
- Sample identifiers (traceable back to operations)
- Train/validation/test splits
- Statistics and metadata
Dataset Fingerprinting
Each unique configuration produces a deterministic fingerprint based on:
- Question ID
- Account ID
- Feature names (ordered)
- Filters
- Target variable
Fingerprints enable:
- Caching: Reuse datasets when configuration hasn't changed
- Reproducibility: Same configuration always produces same dataset
- Traceability: Link model versions back to exact training data
Dataset Assembly Process
Step-by-Step:
- Load question configuration (features, filters, target)
- Fetch feature definitions from catalog
- Compute fingerprint from configuration
- Check cache for existing dataset
- Load operations matching filters
- Assemble feature columns by computing values for each operation
- Extract target values from operations
- Build feature tensor (2D matrix: samples × features)
- Build target tensor (1D vector: samples)
- Split data deterministically into train/validation/test
- Generate statistics (means, standard deviations, missing rates)
- Cache dataset for future reuse
- Create artifact record pointing to cached data
- Record run with metadata and report
Inspecting Datasets
Query Dataset Runs:
query DatasetRuns {
mlRunsByQuestion(questionId: "01JDAAA...") {
id
kind
status
datasetReport
metrics
context
startedAt
completedAt
}
}Dataset Report Fields:
recordCount: Total samples in datasetfingerprint: Unique identifier for this datasetfeatureStats: Per-feature statistics (mean, stdDev, min, max, missingRate)targetStats: Target variable statisticscached: Whether dataset was loaded from cache
Retrieve Dataset Artifacts:
query DatasetArtifacts {
mlArtifactsByRun(runId: "01JDBBB...") {
id
artifactType
name
uri
format
byteSize
metadata
insertedAt
}
}Force Rebuild
When underlying operational data changes, force a dataset rebuild:
mutation RebuildDataset {
buildMlQuestionDataset(
id: "01JDAAA..."
input: { forceRebuild: true }
) {
id
status
}
}Training Models
Model Training Workflow
- Ensure dataset exists (build if necessary)
- Create model version record
- Prepare data (convert tensors to Nx arrays)
- Train XGBoost with specified hyperparameters
- Evaluate on train/validation/test splits
- Persist artifacts (booster model, config, metrics)
- Update model version status to
awaitingApproval - Record training run with metrics and duration
Hyperparameter Tuning
Hyperparameters control model behavior. Defaults are sensible but can be customized:
Common Hyperparameters:
| Parameter | Description | Default | Range |
|---|---|---|---|
maxDepth | Maximum tree depth (prevents overfitting) | 8 | 3-15 |
eta | Learning rate (smaller = slower, more accurate) | 0.1 | 0.01-0.3 |
subsample | Fraction of samples per tree | 0.8 | 0.5-1.0 |
colsampleBytree | Fraction of features per tree | 0.8 | 0.5-1.0 |
objective | Loss function | reg:squarederror | See XGBoost docs |
Training Options:
| Option | Description | Default |
|---|---|---|
numBoostRounds | Number of boosting iterations | 200 |
earlyStoppingRounds | Stop if no improvement | 25 |
verboseEval | Log training progress | false |
Example: Custom Hyperparameters
mutation TrainWithCustomParams {
trainMlQuestion(
id: "01JDAAA..."
input: {
hyperparameters: {
maxDepth: 6
eta: 0.05
subsample: 0.75
colsampleBytree: 0.9
gamma: 0.1
minChildWeight: 5
objective: "reg:squarederror"
evalMetric: ["rmse", "mae"]
}
trainOptions: {
numBoostRounds: 300
earlyStoppingRounds: 30
}
}
) {
id
status
}
}Monitoring Training
Training progress is tracked via MlRun records:
Query Training Run:
query TrainingRun {
mlRunsByModelVersion(modelVersionId: "01JDCCC...") {
id
kind
status
initiatedBy
context
metrics
validationReport
startedAt
completedAt
errorDetails
}
}Training Run Fields:
kind: Alwaystrainingfor model training runsstatus:pending,inProgress,succeeded,failed, orcanceledmetrics: Summary metrics (RMSE, MAE, R²) for each splitvalidationReport: Detailed evaluation with sample countserrorDetails: If failed, diagnostic information
Training Artifacts
After successful training, artifacts are persisted:
Query Model Artifacts:
query ModelArtifacts {
mlArtifactsByModelVersion(modelVersionId: "01JDCCC...") {
id
artifactType
name
uri
format
metadata
}
}Artifact Types:
booster: Trained XGBoost model (JSON format)trainingConfig: Hyperparameters and training optionsreport: Evaluation metrics with sample countsexplainability: Feature importance and SHAP values (future)
Evaluating Model Performance
Evaluation Metrics
Models are evaluated on three data splits:
- Training Set (70%): Data used to train the model
- Validation Set (15%): Data used during training for early stopping
- Test Set (15%): Held-out data for final evaluation
Metrics Provided:
- RMSE (Root Mean Squared Error): Penalizes large errors more heavily
- MAE (Mean Absolute Error): Average absolute error magnitude
- R² (R-squared): Variance explained (1.0 = perfect)
Interpreting Metrics
Good Model Characteristics:
- Test RMSE close to validation RMSE (not overfitting)
- R² > 0.7 on test set (explains most variance)
- MAE acceptable for business use case
Warning Signs:
- Train RMSE much lower than test RMSE (overfitting)
- R² < 0.5 on test set (poor predictive power)
- MAE too large for practical decisions
Example Metrics:
{
"train": { "rmse": 15.2, "mae": 11.3, "r2": 0.92 },
"validation": { "rmse": 18.4, "mae": 13.7, "r2": 0.88 },
"test": { "rmse": 19.1, "mae": 14.2, "r2": 0.87 }
}Interpretation: Model performs well with 87% variance explained. Slight degradation from train to test is normal and acceptable.
Comparing Model Versions
Questions can have multiple model versions (e.g., after retraining with new data or different hyperparameters):
Query All Versions:
query CompareVersions {
mlModelVersionsByQuestion(questionId: "01JDAAA...") {
id
version
status
trainedAt
metricsSnapshot
hyperparameters
datasetFingerprint
}
}Compare by:
- Test set RMSE/MAE (lower is better)
- R² (higher is better)
- Training date (newer may use more recent data)
- Dataset fingerprint (same data or new data?)
Detailed Evaluation Report
Query Validation Report:
query EvaluationDetails {
getMlRun(id: "01JDBBB...") {
validationReport
}
}Sample Report:
{
"train": {
"sampleCount": 875,
"metrics": { "rmse": 15.2, "mae": 11.3, "r2": 0.92 }
},
"validation": {
"sampleCount": 187,
"metrics": { "rmse": 18.4, "mae": 13.7, "r2": 0.88 }
},
"test": {
"sampleCount": 188,
"metrics": { "rmse": 19.1, "mae": 14.2, "r2": 0.87 }
}
}Deploying Models
Model Version Lifecycle
Model versions progress through these states:
- draft: Initial state before training
- training: Model is currently being trained
- evaluating: Metrics are being computed (brief)
- awaitingApproval: Training complete, awaiting human review
- approved: Approved for serving predictions
- retired: No longer recommended for use
Approving a Model Version
After reviewing metrics, approve a version for production:
GraphQL Mutation:
mutation ApproveModel {
updateMlModelVersion(
id: "01JDCCC..."
input: {
status: approved
approvedAt: "2025-10-23T14:30:00Z"
approvedBy: "01JDUSER..." # Your user ID
notes: "Approved based on test R² of 0.87 and acceptable MAE"
}
) {
id
version
status
approvedAt
approvedBy
}
}Setting the Serving Model
Update the question to use a specific approved model:
GraphQL Mutation:
mutation SetServingModel {
updateMlQuestion(
id: "01JDAAA..."
input: {
servingModelVersionId: "01JDCCC..."
status: serving
}
) {
id
status
servingModelVersionId
servingModelVersion {
id
version
status
metricsSnapshot
}
}
}Generating Predictions
Once a serving model is set, predictions can be generated (implementation depends on your serving infrastructure):
Conceptual Query:
mutation Predict {
predictMlQuestion(
id: "01JDAAA..."
input: {
features: {
yesterdayProduction: 450
staffingLevel: 12
equipmentAvailability: 0.95
}
}
) {
prediction
confidence
featureImportance
}
}Note: Prediction serving is typically handled by separate inference infrastructure that loads the booster artifact and serves requests at low latency.
Retiring Models
When a model is outdated or superseded:
GraphQL Mutation:
mutation RetireModel {
updateMlModelVersion(
id: "01JDCCC..."
input: {
status: retired
notes: "Superseded by version 3 with improved accuracy"
}
) {
id
status
}
}Monitoring and Runs
What are Runs?
MlRun records track every significant workflow step:
- Dataset assembly (
kind: dataset) - Model training (
kind: training) - Model evaluation (
kind: evaluation) - Prediction serving (
kind: serving)
Runs provide:
- Execution status and timestamps
- Metrics and reports
- Error details for debugging
- Traceability (who initiated, why)
Querying Runs
By Question:
query QuestionRuns {
mlRunsByQuestion(questionId: "01JDAAA...") {
id
kind
status
initiatedBy
context
datasetReport
validationReport
metrics
startedAt
completedAt
errorDetails
}
}By Model Version:
query ModelRuns {
mlRunsByModelVersion(modelVersionId: "01JDCCC...") {
id
kind
status
metrics
startedAt
completedAt
}
}Run Statuses
pending: Queued but not startedinProgress: Currently executingsucceeded: Completed successfullyfailed: Encountered an errorcanceled: Manually stopped
Error Handling
When runs fail, inspect errorDetails:
Example Error:
{
"stage": "dataset_assembly",
"reason": "insufficient_sample_size",
"details": {
"minimumRequired": 100,
"actualCount": 47,
"filters": { "siteId": "01JCZZZ...", "productFamily": "widgets" }
}
}Common Errors:
insufficient_sample_size: Not enough data matching filtersmissing_features: Required features unavailablefeature_quality_issues: Data quality below thresholdtraining_failed: XGBoost training error (check hyperparameters)
Run Context
The context field provides execution metadata:
Dataset Run:
{
"stage": "training",
"triggeredBy": "user",
"datasetRun": "01JDBBB..."
}Training Run:
{
"datasetRunId": "01JDBBB...",
"modelVersion": 2
}GraphQL API Reference
Queries
Questions
List all questions:
query {
listMlQuestions {
id
name
label
questionTemplate
status
configuration
servingModelVersionId
insertedAt
updatedAt
}
}Get single question:
query {
getMlQuestion(id: "01JDAAA...") {
id
name
label
description
questionTemplate
status
configuration
metadata
notes
servingModelVersion {
id
version
metricsSnapshot
}
modelVersions {
id
version
status
trainedAt
}
runs {
id
kind
status
completedAt
}
}
}Questions by template:
query {
mlQuestionsByTemplate(questionTemplate: "daily_throughput_forecast") {
id
name
label
status
}
}Question Templates
List templates:
query {
listMlQuestionTemplates {
id
name
label
category
questionCopy
recommendedFeatures
requiredFeatures
optionalFeatures
status
}
}Get template by name:
query {
mlQuestionTemplateByName(name: "daily_throughput_forecast") {
id
name
label
description
recommendedFeatures
defaultConfiguration
eligibilityRules
interpretationGuidance
}
}Features
List features:
query {
listMlFeatures {
id
name
label
description
category
dataType
units
tags
allowedTemplates
recommendedTemplates
active
qualityState
}
}Get feature by name:
query {
mlFeatureByName(name: "yesterday_production") {
id
name
label
description
sourceResource
sourceAction
transformation
dataType
validationRules
qualityChecks
qualityState
}
}Model Versions
List all versions:
query {
listMlModelVersions {
id
questionId
version
status
trainedAt
approvedAt
metricsSnapshot
}
}Versions by question:
query {
mlModelVersionsByQuestion(questionId: "01JDAAA...") {
id
version
status
trainedAt
metricsSnapshot
hyperparameters
datasetFingerprint
}
}Runs
List runs:
query {
listMlRuns {
id
questionId
modelVersionId
kind
status
initiatedBy
startedAt
completedAt
}
}Runs by question:
query {
mlRunsByQuestion(questionId: "01JDAAA...") {
id
kind
status
datasetReport
validationReport
metrics
errorDetails
}
}Artifacts
List artifacts:
query {
listMlArtifacts {
id
runId
modelVersionId
artifactType
name
uri
format
}
}Artifacts by model version:
query {
mlArtifactsByModelVersion(modelVersionId: "01JDCCC...") {
id
artifactType
name
uri
format
byteSize
metadata
}
}Mutations
Questions
Create question:
mutation {
createMlQuestion(input: {
name: "my_question"
label: "My Question"
questionTemplate: "template_name"
description: "Description"
status: draft
configuration: {
features: ["feature1", "feature2"]
filters: { key: "value" }
target: "target_var"
}
}) {
id
name
status
}
}Update question:
mutation {
updateMlQuestion(id: "01JDAAA...", input: {
label: "Updated Label"
configuration: { /* new config */ }
status: serving
servingModelVersionId: "01JDCCC..."
}) {
id
updatedAt
}
}Delete question:
mutation {
deleteMlQuestion(id: "01JDAAA...") {
id
}
}Build dataset:
mutation {
buildMlQuestionDataset(id: "01JDAAA...", input: {
forceRebuild: false
}) {
id
status
}
}Train model:
mutation {
trainMlQuestion(id: "01JDAAA...", input: {
forceDatasetRebuild: false
hyperparameters: {
maxDepth: 8
eta: 0.1
}
trainOptions: {
numBoostRounds: 200
}
}) {
id
status
}
}Features
Create feature:
mutation {
createMlFeature(input: {
name: "feature_name"
label: "Feature Label"
description: "Description"
category: "operations"
sourceResource: "Operation"
sourceAction: "action_name"
transformation: {}
dataType: float
units: "units"
tags: ["tag1", "tag2"]
active: true
}) {
id
name
}
}Update feature:
mutation {
updateMlFeature(id: "01JCYYY...", input: {
label: "Updated Label"
description: "Updated description"
active: false
}) {
id
updatedAt
}
}Model Versions
Update version (approve):
mutation {
updateMlModelVersion(id: "01JDCCC...", input: {
status: approved
approvedAt: "2025-10-23T14:30:00Z"
approvedBy: "01JDUSER..."
notes: "Approved for production"
}) {
id
status
approvedAt
}
}Advanced Topics
Multi-Step Workflows with QuestionBuilder
For advanced use cases, the platform includes a QuestionBuilder pipeline that validates and assembles questions from templates:
Conceptual Workflow:
- Select a template
- Choose features (required + optional)
- Provide filters
- QuestionBuilder validates:
- Template exists and is active
- All features are allowed by template
- Required features are included
- Feature quality meets thresholds
- Filters satisfy eligibility rules
- Configuration is assembled and merged with template defaults
- Question is created with validated configuration
This workflow is typically handled by the UI but can be replicated via API calls.
Dataset Caching and Fingerprinting
How Caching Works:
- Configuration is serialized (question ID, features, filters, target)
- SHA-256 hash produces fingerprint
- Cache lookup by fingerprint
- If hit: return cached dataset
- If miss: assemble fresh dataset and cache
Benefits:
- Faster iteration when experimenting with hyperparameters
- Reproducibility (same config = same dataset)
- Resource efficiency (avoid redundant computation)
Cache Invalidation:
- Manually force rebuild with
forceRebuild: true - Cache entries may be purged based on age or size limits
Feature Transformations
Features support various transformation types:
- Aggregations:
sum,average,count,min,max - Time Windows:
1_day,7_days,30_days,90_days - Offsets:
-1_day,-7_days(for lagged features) - Custom: Platform-specific transformations
Example:
{
"transformation": {
"type": "average",
"timeWindow": "7_days",
"offset": "-1_day",
"aggregateBy": "siteId"
}
}Train/Validation/Test Splits
Data is split deterministically using a seed derived from the dataset fingerprint:
Default Ratios:
- Train: 70%
- Validation: 15%
- Test: 15%
Custom Ratios:
{
"configuration": {
"splitRatio": {
"train": 0.8,
"validation": 0.1,
"test": 0.1
}
}
}Splitting Algorithm:
- Samples are shuffled using fingerprint-based seed
- Deterministic (same fingerprint = same split)
- Stratified by sample ID (reproducible)
Handling Missing Data
The platform handles missing feature values:
Strategies:
- Numeric features: Impute with mean or median
- Categorical features: Encode as special "missing" category
- High missing rates: Flag in quality checks
Best Practices:
- Monitor feature
missingRatein dataset reports - Set
maxMissingRatein feature quality checks - Investigate and address root causes of missingness
Multi-Tenant Isolation
All ML resources are tenant-scoped:
- Questions, features, and models are isolated per account
- Dataset assembly only accesses tenant's operational data
- IAM policies enforce permissions
GraphQL automatically scopes queries/mutations to the authenticated tenant. No manual tenant filtering required.
Troubleshooting
Common Issues
Issue: "Insufficient sample size"
Symptom: Dataset assembly fails with insufficient_sample_size error.
Causes:
- Filters too restrictive (e.g., filtering to a site with little data)
- Template requires minimum sample size not met
Solutions:
- Broaden filters (remove or relax constraints)
- Accumulate more operational data before training
- Check template's
eligibilityRules.minimumSampleSize
Issue: "Feature quality issues"
Symptom: Dataset assembly fails with feature_quality_issues error.
Causes:
- Feature has high missing rate exceeding
maxMissingRate - Feature data is stale or unavailable
Solutions:
- Check feature quality state:
getMlFeature(id: "...") { qualityState, qualityChecks } - Review operational data pipeline for the feature's source
- Exclude problematic feature and use alternatives
- Adjust
maxMissingRatein feature definition if acceptable
Issue: Model overfitting
Symptom: Train RMSE much lower than test RMSE; poor generalization.
Causes:
- Model too complex (
maxDepthtoo high) - Too many boosting rounds
- Small dataset with many features
Solutions:
- Reduce
maxDepth(try 4-6) - Reduce
numBoostRoundsor increaseearlyStoppingRounds - Reduce feature count (focus on most important)
- Increase regularization (
gamma,lambda,alpha)
Issue: Model underfitting
Symptom: Low R² on all splits; predictions not much better than mean.
Causes:
- Model too simple
- Features lack predictive power
- Insufficient boosting rounds
Solutions:
- Increase
maxDepth(try 10-12) - Increase
numBoostRounds - Add more relevant features
- Check target variable variance (is it predictable?)
Issue: "Training failed"
Symptom: Training run status is failed with XGBoost error.
Causes:
- Invalid hyperparameters
- Data format issues (NaN, Inf)
- Insufficient resources (memory, time)
Solutions:
- Review
errorDetailsin run record - Validate hyperparameters against XGBoost documentation
- Check dataset report for data anomalies
- Contact platform support if persistent
Debugging Workflows
Step 1: Verify Question Configuration
query {
getMlQuestion(id: "01JDAAA...") {
configuration
status
}
}Step 2: Check Dataset Run
query {
mlRunsByQuestion(questionId: "01JDAAA...") {
id
kind
status
datasetReport
errorDetails
}
}Step 3: Review Training Run
query {
mlRunsByModelVersion(modelVersionId: "01JDCCC...") {
id
kind
status
metrics
validationReport
errorDetails
}
}Step 4: Inspect Artifacts
query {
mlArtifactsByRun(runId: "01JDBBB...") {
artifactType
name
uri
metadata
}
}Performance Optimization
Dataset Assembly:
- Use filters to reduce data volume
- Ensure feature source actions are optimized
- Leverage caching (avoid
forceRebuildunless necessary)
Model Training:
- Start with default hyperparameters
- Use early stopping to prevent overtraining
- Monitor training duration in run records
Predictions:
- Load booster artifacts once and reuse
- Batch predictions when possible
- Cache predictions for static inputs
Best Practices
1. Start Simple
- Begin with recommended features from templates
- Use default hyperparameters for initial training
- Iterate based on evaluation metrics
2. Monitor Feature Quality
- Regularly review feature
qualityState - Investigate and address quality warnings
- Deactivate features with persistent issues
3. Version Control Configurations
- Use descriptive question names and labels
- Document configuration changes in
notesfield - Track model versions with approval notes
4. Evaluate Rigorously
- Always review test set metrics (not just training)
- Compare new versions against existing serving models
- Consider business impact (e.g., cost of prediction errors)
5. Retrain Regularly
- Retrain models when new operational data accumulates
- Monitor prediction accuracy over time (concept drift)
- Automate retraining schedules for production questions
6. Collaborate
- Share question templates across teams
- Document interpretation guidance for operators
- Review model approvals with domain experts
7. Secure Artifacts
- Artifacts contain sensitive operational insights
- Ensure IAM policies restrict access appropriately
- Audit model deployments and usage
Additional Resources
XGBoost Documentation
GraphQL Resources
Platform Support
- Contact your platform administrator for assistance
- Review platform changelog for new features
- Join user community forums (coming soon)
Appendix: Field Reference
MlQuestion Fields
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
accountId | UUID | Tenant identifier (auto-set) |
name | String | Stable identifier (slug) |
label | String | Display name |
questionTemplate | String | Template name reference |
description | String | Long description |
status | Enum | draft, collecting, training, serving, archived |
configuration | Map | Features, filters, target, splitRatio |
metadata | Map | Tags, owner, custom fields |
notes | String | Operational notes |
servingModelVersionId | UUID | Currently approved model |
lastServedAt | DateTime | Last prediction timestamp |
insertedAt | DateTime | Creation timestamp |
updatedAt | DateTime | Last update timestamp |
MlModelVersion Fields
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
questionId | UUID | Owning question |
version | Integer | Monotonic version number |
status | Enum | draft, training, evaluating, awaitingApproval, approved, retired |
trainedAt | DateTime | Training completion timestamp |
approvedAt | DateTime | Approval timestamp |
approvedBy | UUID | Approver user ID |
notes | String | Approval/retirement notes |
metricsSnapshot | Map | Train/validation/test metrics |
hyperparameters | Map | XGBoost hyperparameters used |
datasetFingerprint | String | Dataset identifier |
MlRun Fields
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
questionId | UUID | Associated question |
modelVersionId | UUID | Associated model version (if applicable) |
kind | Enum | dataset, training, evaluation, serving |
status | Enum | pending, inProgress, succeeded, failed, canceled |
initiatedBy | String | User or system identifier |
context | Map | Execution context metadata |
metrics | Map | Performance metrics |
datasetReport | Map | Dataset statistics (for dataset runs) |
validationReport | Map | Evaluation details (for training runs) |
errorDetails | Map | Error diagnostics (if failed) |
startedAt | DateTime | Execution start timestamp |
completedAt | DateTime | Execution completion timestamp |
Feature Fields
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier |
name | String | Stable identifier (snake_case) |
label | String | Display name |
description | String | What the feature represents |
category | String | Grouping (operations, staffing, etc.) |
sourceResource | String | Ash resource providing data |
sourceAction | String | Action/calculation name |
transformation | Map | Transformation metadata |
dataType | Enum | integer, float, string, boolean, datetime |
units | String | Measurement units |
tags | Array | Searchable keywords |
allowedTemplates | Array | Compatible template names |
recommendedTemplates | Array | Templates where recommended |
defaultFilters | Map | Auto-applied filters |
validationRules | Map | Business constraints |
qualityChecks | Map | Data quality expectations |
active | Boolean | Available for selection |
qualityState | Enum | good, warning, error, unknown |
autoGenerated | Boolean | Created from metric |
sourceMetricId | UUID | Source metric (if auto-generated) |
Glossary
- Artifact: Persisted file (dataset, model booster, report) generated during ML workflows
- Booster: Trained XGBoost model
- Configuration: Question-specific settings (features, filters, target)
- Dataset: Structured collection of features and target values for training
- Feature: Reusable data point derived from operational data
- Fingerprint: Hash uniquely identifying a dataset configuration
- Hyperparameter: Tunable parameter controlling model training (e.g., maxDepth, eta)
- Model Version: Specific trained instance of a question with metrics and approval state
- Question: Plain-language definition of a prediction task
- Run: Execution record tracking dataset assembly, training, or evaluation
- Split: Subset of dataset (train, validation, test)
- Template: Pre-configured starting point for common ML use cases
- Tenant: Account/organization in the multi-tenant system
- XGBoost: Gradient boosting library used for model training
Conclusion
You now have a comprehensive understanding of the CoCore machine learning platform. Start by exploring question templates, creating your first ML question, and iterating based on evaluation metrics. The platform handles the complexity of feature engineering, dataset assembly, and model training, allowing you to focus on business outcomes.
For additional support, consult your platform administrator or refer to the GraphQL API documentation embedded in the GraphQL schema explorer.
Happy predicting!