 
					
				
		
Hello
I have data that comes in as .txt format. Its dropped into a folder that's monitored by Splunk. There is a current extraction we are using to pull the headers out of the data but there has been a new field added to the .txt and I need to create a new extraction for the headers.
the data looks like this:
Type      AppliesTo  Path                             Snap  Hard    Soft  Adv     Used  Efficiency
---------------------------------------------------------------------------------------------------
directory DEFAULT    /ifs/common-place                 No    200.00G -     190.00G 0.00  0.00 : 1
directory DEFAULT    /ifs/data/capacity/T1000-CPReports No    100.00M -     99.00M  53.00 0.00 : 1
directory DEFAULT    /ifs/work/departments/T1000/Cognitus      No    10.00G  -     9.50G   348.27M 0.42 : 1
directory DEFAULT    /ifs/work/Projects/ref/staging          No    200.00G -     195.00G 3.72G   0.74 : 1
directory DEFAULT    /ifs/work/Projects/S4/ref/sapmnt        No    200.00G -     195.00G 1.69G   0.54 : 1
directory DEFAULT    /ifs/data/capacity/T1000-CPReports       No    100.00M -     99.00M  16.22k  0.13 : 1
---------------------------------------------------------------------------------------------------
Total: 6
the fields can either be populated with values OR if there's no value for the field it will use the literal dash
-
It is possible for the "Path" field to contain a space.
The last column/field is called "Efficiency". For example this record:
directory DEFAULT    /ifs/data/capacity/T1000-CPReports       No    100.00M -     99.00M  16.22k  0.13 : 1
The Efficiency is "0.13 : 1"
For this record its "0.00 : 1"
directory DEFAULT    /ifs/data/capacity/T1000-CPReports No    100.00M -     99.00M  53.00 0.00 : 1
Can someone help me fix the extraction or if there's a better one let me know?
REGEX = ^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) *$
Thank you for the help
 
					
				
		
I got an extraction that works:
^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) +(?<Efficiency>\d+\.\d+\s\:\s\d+) *$
Thanks
 
					
				
		
I got an extraction that works:
^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) +(?<Efficiency>\d+\.\d+\s\:\s\d+) *$
Thanks
 
		
		
		
		
		
	
			
		
		
			
					
		You could let Splunk sort out the header for you:
https://docs.splunk.com/Documentation/Splunk/8.0.6/Data/Extractfieldsfromfileswithstructureddata
INDEXED_EXTRACTIONS=csv may do this better and easier.
I'd also send events that match '-------' or whatever to nullQueue:
The advantages of letting Splunk do it is that it's got a full tested CSV parser. If you get different fields tomorrow, it "just works". If a field starts coming in wrapped in quotes with escaped quotes inside it, it still "just works".
Writing a CSV parser in REGEX is ... well, it can work if you can assure the csv doesn't change, doesn't ever do anything weird inside it, and ... is trivial. But even then it's more work than letting Splunk do it.
Let us know what you end up doing!
Happy Splunking,
Rich
 
					
				
		
Hey Rich
Thanks for responding. I am a dummy and typed .csv when I mean to type .txt. The data is simple txt formatted. I should have realized my mistake in typing when I pasted the file example in.
I will definitely send the "----------------" to the NullQueue
I got the extraction to work in my regex tester BUT it doesnt seem to be extracting.
