Hello
I have data that comes in as .txt format. Its dropped into a folder that's monitored by Splunk. There is a current extraction we are using to pull the headers out of the data but there has been a new field added to the .txt and I need to create a new extraction for the headers.
the data looks like this:
Type AppliesTo Path Snap Hard Soft Adv Used Efficiency
---------------------------------------------------------------------------------------------------
directory DEFAULT /ifs/common-place No 200.00G - 190.00G 0.00 0.00 : 1
directory DEFAULT /ifs/data/capacity/T1000-CPReports No 100.00M - 99.00M 53.00 0.00 : 1
directory DEFAULT /ifs/work/departments/T1000/Cognitus No 10.00G - 9.50G 348.27M 0.42 : 1
directory DEFAULT /ifs/work/Projects/ref/staging No 200.00G - 195.00G 3.72G 0.74 : 1
directory DEFAULT /ifs/work/Projects/S4/ref/sapmnt No 200.00G - 195.00G 1.69G 0.54 : 1
directory DEFAULT /ifs/data/capacity/T1000-CPReports No 100.00M - 99.00M 16.22k 0.13 : 1
---------------------------------------------------------------------------------------------------
Total: 6
the fields can either be populated with values OR if there's no value for the field it will use the literal dash
-
It is possible for the "Path" field to contain a space.
The last column/field is called "Efficiency". For example this record:
directory DEFAULT /ifs/data/capacity/T1000-CPReports No 100.00M - 99.00M 16.22k 0.13 : 1
The Efficiency is "0.13 : 1"
For this record its "0.00 : 1"
directory DEFAULT /ifs/data/capacity/T1000-CPReports No 100.00M - 99.00M 53.00 0.00 : 1
Can someone help me fix the extraction or if there's a better one let me know?
REGEX = ^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) *$
Thank you for the help
I got an extraction that works:
^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) +(?<Efficiency>\d+\.\d+\s\:\s\d+) *$
Thanks
I got an extraction that works:
^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) +(?<Efficiency>\d+\.\d+\s\:\s\d+) *$
Thanks
You could let Splunk sort out the header for you:
https://docs.splunk.com/Documentation/Splunk/8.0.6/Data/Extractfieldsfromfileswithstructureddata
INDEXED_EXTRACTIONS=csv may do this better and easier.
I'd also send events that match '-------' or whatever to nullQueue:
The advantages of letting Splunk do it is that it's got a full tested CSV parser. If you get different fields tomorrow, it "just works". If a field starts coming in wrapped in quotes with escaped quotes inside it, it still "just works".
Writing a CSV parser in REGEX is ... well, it can work if you can assure the csv doesn't change, doesn't ever do anything weird inside it, and ... is trivial. But even then it's more work than letting Splunk do it.
Let us know what you end up doing!
Happy Splunking,
Rich
Hey Rich
Thanks for responding. I am a dummy and typed .csv when I mean to type .txt. The data is simple txt formatted. I should have realized my mistake in typing when I pasted the file example in.
I will definitely send the "----------------" to the NullQueue
I got the extraction to work in my regex tester BUT it doesnt seem to be extracting.