Streamlining Data Onboarding: Announcing the Alpha Release of AI-Assisted Auto-Schematization
For many Splunk administrators and security analysts, the path from raw data ingestion to actionable insight is often hindered by the complexity of data normalization. Manually building Technology Add-ons (TAs), writing complex regular expressions, and ensuring alignment with the Common Information Model (CIM) can take days, or even weeks, of focused effort.
Effective threat detection and high‑performance analytics depend on consistently structured data. By reducing the manual toil required to achieve “CIM‑compliance,” we can unlock faster time‑to‑value for security use cases.
Key Takeaways
AI-Driven Normalization: Automatically maps sample data to appropriate CIM data models using Large Language Models.
Automated Extraction: Generates necessary extraction rules and regular expressions, significantly reducing manual regex authoring.
Flexible Deployment: Produces both Technology Add-on (TA) packages for search-time mapping and SPL2 pipelines for ingest-time processing.
Today, we are pleased to announce a significant step forward in simplifying this process: Alpha release of Auto-schematization in Data Management - an AI‑assisted workflow that accelerates CIM‑aligned onboarding while keeping humans in control.
The Challenge: Overcoming the Onboarding Bottleneck
Effective threat detection and high-performance analytics depend on consistently structured data. However, the manual labor required to achieve "CIM-compliance" often delays time-to-value for new data sources. Our goal is to transform this experience from a manual, multi-tool mapping process into a streamlined, AI-assisted workflow.
The Solution: Automated Data Schematization
By leveraging Large Language Models, our new custom integration workflow automates the end-to-end schematization process. Whether you are dealing with structured logs or unique formats, the system provides:
Intelligent Data Model Mapping: The AI analyzes your sample data to automatically suggest the most appropriate CIM data models, ensuring your data is ready for CIM-normalized security detections.
Automated Field Extraction & Calculation: The system generates the necessary extraction rules and regular expressions, significantly reducing the need for manual regex authoring.
Flexible Deployment Artifacts: Depending on your data strategy, the tool generates either Technology Add-on (TA) packages for search-time mapping (schema-on-read) or SPL2 pipelines for ingest-time processing (schema-on-write).
How It Works: The Guided Workflow
The guided workflow is designed to be intuitive and transparent:
Provide Sample Data: Upload or paste representative events.
Intelligent Analysis: The system clusters similar events and recommends CIM mappings.
Preview: You preview AI‑generated extractions and mappings.
Download and Deploy: Download generated artifacts for self-deployment to Splunk environment.
Why Auto-Schematization Matters
This initiative is a core component of Splunk’s broader Cisco Data Fabric strategy, an architecture built to unify your data. By reducing admin toil and accelerating onboarding time from weeks to minutes, enabling your team to focus on high-value security operations rather than configuration maintenance.
Join the Alpha Cohort
Want to accelerate your onboarding? Join the broader Guided Onboarding experience and reach out directly to our Splunk experts. Customer feedback will be instrumental in shaping the future of data management at Splunk, as we continue to build a more intelligent and efficient data onboarding experience.
Oskar Patnaik
Product Management, Splunk Data Management
... View more