Release Notes

AI 2.2.0 New and Optimized FeaturesConnection HubNeMo Guardrails IntegrationRAG Evaluation with RAGASLlamaFactory Fine-tuningHardware Profile Definitions & TemplatesServingRuntime ManagementNotebook Base Image Library for ARMDrift Detection with TrustyAIMCP Integration with Llama StackDeprecated FeaturesFixed IssuesKnown Issues

AI 2.2.0

New and Optimized Features

Connection Hub

Connections enable users to securely configure access to external data sources and model storage locations by encapsulating credentials and configuration parameters as reusable project resources. Connection Types provide templated forms with customizable fields and default values, streamlining connection creation for common storage protocols. Version 2.2 includes built-in connection types for OCI-compliant registries and URI-based repositories, enabling model deployment from container images and remote endpoints. S3-compatible object storage connection type is under development.

NeMo Guardrails Integration

NVIDIA NeMo Guardrails provides programmable safety controls for LLM applications, running as a separate service in front of the model. It enforces sensitive data detection (PII), content policies, and custom validation flows written in Colang and Python, exposed through the TrustyAI Operator's NemoGuardrails custom resource.

RAG Evaluation with RAGAS

RAGAS (Retrieval-Augmented Generation Assessment) integration provides objective metrics for evaluating RAG applications, including retrieval quality, answer relevance, and factual consistency. Developers can automate quality gates and optimize RAG configurations using evaluation pipelines.

LlamaFactory Fine-tuning

LlamaFactory integration through Kubeflow Trainer v2 provides a streamlined solution for model fine-tuning, supporting SFT, LoRA, and QLoRA training algorithms. Users can customize foundation models with their own datasets through single-node and multi-node distributed training.

Hardware Profile Definitions & Templates

Hardware Profiles enable centralized management of hardware resource allocation for AI/ML workloads. Administrators can define custom hardware configurations with specific accelerator types, memory limits, and node placement rules, enabling GPU-as-a-Service capabilities with self-service provisioning.

ServingRuntime Management

Extend the AI Platform with custom inference runtimes to serve LLMs or other model types (image classification, object detection, etc.). Administrators can add custom runtimes such as MLServer, Triton, or Xinference through ClusterServingRuntime resources to support additional model frameworks, GPU types, and specialized inference scenarios beyond the default vLLM runtime.

Notebook Base Image Library for ARM

The Notebook Base Image Library now includes minimal and datascience notebook images for ARM architecture, expanding hardware compatibility for notebook-based development on ARM platforms.

Drift Detection with TrustyAI

Monitor deployed models for data drift by detecting changes in input data distributions over time. TrustyAI Drift Detection compares real-world inference data against original training data using statistical metrics to identify shifts that could impact model performance, ensuring models remain accurate and reliable in production.

MCP Integration with Llama Stack

Llama Stack Connectors provide a high-level abstraction for AI registries such as Model Context Protocol (MCP). Platform Engineers can register connectors, and AI Engineers can consume pre-registered connectors without managing complex configurations, enabling AI agents to connect to external tools and data sources through standardized interfaces.

Deprecated Features

None in this release.

Fixed Issues

When updating the inference service resource yaml through the page, the volumeMount field is missing, which can cause the inference service to fail to start properly
In older versions, GraphQL queries (POST by default) were incorrectly intercepted by the gateway layer and checked for create permission. In the new version, requests sent to the /api/graphql interface are correctly treated as get read permissions by the RBAC interceptor, ensuring that users with read-only roles can read and access page content containing GraphQL data streams without problems.

Known Issues

After deleting a model, the list page fails to reflect the deletion result immediately, and the deleted model still briefly exists in the list. Temporary solution, manually refresh the page.
When accessing an AI page within a namespace that is not under management, you cannot switch to a page within a namespace that is under management.

#Release Notes

#TOC

#AI 2.2.0

#New and Optimized Features

#Connection Hub

#NeMo Guardrails Integration

#RAG Evaluation with RAGAS

#LlamaFactory Fine-tuning

#Hardware Profile Definitions & Templates

#ServingRuntime Management

#Notebook Base Image Library for ARM

#Drift Detection with TrustyAI

#MCP Integration with Llama Stack

#Deprecated Features

#Fixed Issues

#Known Issues