I have the opportunity to pull in some ticket system data and create some statistics / visualizations. The data consists of many “categories”. However, there are some details in the SUMMARY field that keep me from grouping/counting etc by SUMMARY as the SUMMARY value is unique in the last couple of characters. Here’s a sample of the SUMMARY field data
Pastebin extraction fn:23l4dixr
Pastebin extraction fn:xx3l9dib
Pastebin extraction fn:dk244diL
I would like to group/count by "Pastebin extraction". First attempt (successful) was to built regexes that I applied to the file BEFORE pulling into splunk that removes the unique fn:xxxxxxxx at the end of the SUMMARY field. I then created a separate index and pulled the data in using the CSV sourcetype. Due to the column headers, it appears splunk had no issues parsing the field data. This allowed me to group/count which was a good learning experience in and of itself. But now, I have no details if I need them.
It seems that most folks likely don’t massage data prior to a forwarder picking up the data. Perhaps then, the normalization, if you will, occurs just prior to indexing? Or perhaps during query? Maybe it’s possible either way?
At any rate, I’d appreciate a breadcrumb / link to some reading on how to remove the step of pre-processing of the data and to perform this a bit further down the line.
Is learning to properly use props.conf and transforms.conf my only (or best) approach?
What if I want to retain the unique details “just-in-case” and don’t want it removed prior to indexing?
Apologies if my terminology is not up to snuff.. just getting started with Splunk.
Thanks,
Sudsy
... View more