Release Notes
TOC
AI 2.2.0New and Optimized FeaturesConnection HubNeMo Guardrails IntegrationRAG Evaluation with RAGASLlamaFactory Fine-tuningHardware Profile Definitions & TemplatesServingRuntime ManagementNotebook Base Image Library for ARMDrift Detection with TrustyAIMCP Integration with Llama StackDeprecated FeaturesFixed IssuesKnown IssuesAI 2.2.0
New and Optimized Features
Connection Hub
Connections enable users to securely configure access to external data sources and model storage locations by encapsulating credentials and configuration parameters as reusable project resources. Connection Types provide templated forms with customizable fields and default values, streamlining connection creation for common storage protocols. Version 2.2 includes built-in connection types for OCI-compliant registries and URI-based repositories, enabling model deployment from container images and remote endpoints. S3-compatible object storage connection type is under development.
NeMo Guardrails Integration
NVIDIA NeMo Guardrails provides programmable safety controls for LLM applications, running as a separate service in front of the model. It enforces sensitive data detection (PII), content policies, and custom validation flows written in Colang and Python, exposed through the TrustyAI Operator's NemoGuardrails custom resource.
RAG Evaluation with RAGAS
RAGAS (Retrieval-Augmented Generation Assessment) integration provides objective metrics for evaluating RAG applications, including retrieval quality, answer relevance, and factual consistency. Developers can automate quality gates and optimize RAG configurations using evaluation pipelines.
LlamaFactory Fine-tuning
LlamaFactory integration through Kubeflow Trainer v2 provides a streamlined solution for model fine-tuning, supporting SFT, LoRA, and QLoRA training algorithms. Users can customize foundation models with their own datasets through single-node and multi-node distributed training.
Hardware Profile Definitions & Templates
Hardware Profiles enable centralized management of hardware resource allocation for AI/ML workloads. Administrators can define custom hardware configurations with specific accelerator types, memory limits, and node placement rules, enabling GPU-as-a-Service capabilities with self-service provisioning.
ServingRuntime Management
Extend the AI Platform with custom inference runtimes to serve LLMs or other model types (image classification, object detection, etc.). Administrators can add custom runtimes such as MLServer, Triton, or Xinference through ClusterServingRuntime resources to support additional model frameworks, GPU types, and specialized inference scenarios beyond the default vLLM runtime.
Notebook Base Image Library for ARM
The Notebook Base Image Library now includes minimal and datascience notebook images for ARM architecture, expanding hardware compatibility for notebook-based development on ARM platforms.
Drift Detection with TrustyAI
Monitor deployed models for data drift by detecting changes in input data distributions over time. TrustyAI Drift Detection compares real-world inference data against original training data using statistical metrics to identify shifts that could impact model performance, ensuring models remain accurate and reliable in production.
MCP Integration with Llama Stack
Llama Stack Connectors provide a high-level abstraction for AI registries such as Model Context Protocol (MCP). Platform Engineers can register connectors, and AI Engineers can consume pre-registered connectors without managing complex configurations, enabling AI agents to connect to external tools and data sources through standardized interfaces.
Deprecated Features
None in this release.
Fixed Issues
- When updating the inference service resource yaml through the page, the volumeMount field is missing, which can cause the inference service to fail to start properly
- In older versions, GraphQL queries (POST by default) were incorrectly intercepted by the gateway layer and checked for create permission. In the new version, requests sent to the /api/graphql interface are correctly treated as get read permissions by the RBAC interceptor, ensuring that users with read-only roles can read and access page content containing GraphQL data streams without problems.
Known Issues
- After deleting a model, the list page fails to reflect the deletion result immediately, and the deleted model still briefly exists in the list. Temporary solution, manually refresh the page.
- When accessing an AI page within a namespace that is not under management, you cannot switch to a page within a namespace that is under management.