Solved: How can I fix my transforms .txt header extraction...

tkw03 · ‎10-06-2020

Hello

I have data that comes in as .txt format. Its dropped into a folder that's monitored by Splunk. There is a current extraction we are using to pull the headers out of the data but there has been a new field added to the .txt and I need to create a new extraction for the headers.

the data looks like this:

Type      AppliesTo  Path                             Snap  Hard    Soft  Adv     Used  Efficiency
---------------------------------------------------------------------------------------------------
directory DEFAULT    /ifs/common-place                 No    200.00G -     190.00G 0.00  0.00 : 1
directory DEFAULT    /ifs/data/capacity/T1000-CPReports No    100.00M -     99.00M  53.00 0.00 : 1
directory DEFAULT    /ifs/work/departments/T1000/Cognitus      No    10.00G  -     9.50G   348.27M 0.42 : 1
directory DEFAULT    /ifs/work/Projects/ref/staging          No    200.00G -     195.00G 3.72G   0.74 : 1
directory DEFAULT    /ifs/work/Projects/S4/ref/sapmnt        No    200.00G -     195.00G 1.69G   0.54 : 1
directory DEFAULT    /ifs/data/capacity/T1000-CPReports       No    100.00M -     99.00M  16.22k  0.13 : 1
---------------------------------------------------------------------------------------------------
Total: 6

the fields can either be populated with values OR if there's no value for the field it will use the literal dash

It is possible for the "Path" field to contain a space.

The last column/field is called "Efficiency". For example this record:

directory DEFAULT    /ifs/data/capacity/T1000-CPReports       No    100.00M -     99.00M  16.22k  0.13 : 1

The Efficiency is "0.13 : 1"

For this record its "0.00 : 1"

directory DEFAULT    /ifs/data/capacity/T1000-CPReports No    100.00M -     99.00M  53.00 0.00 : 1

Can someone help me fix the extraction or if there's a better one let me know?

REGEX = ^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) *$

Thank you for the help

tkw03 · ‎10-07-2020

I got an extraction that works:

^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) +(?<Efficiency>\d+\.\d+\s\:\s\d+) *$

Thanks

View solution in original post

tkw03 · ‎10-07-2020

I got an extraction that works:

^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) +(?<Efficiency>\d+\.\d+\s\:\s\d+) *$

Thanks

Richfez · ‎10-06-2020

You could let Splunk sort out the header for you:

https://docs.splunk.com/Documentation/Splunk/8.0.6/Data/Extractfieldsfromfileswithstructureddata

INDEXED_EXTRACTIONS=csv may do this better and easier.

I'd also send events that match '-------' or whatever to nullQueue:

https://docs.splunk.com/Documentation/Splunk/8.0.6/Forwarding/Routeandfilterdatad#Discard_specific_e...

The advantages of letting Splunk do it is that it's got a full tested CSV parser. If you get different fields tomorrow, it "just works". If a field starts coming in wrapped in quotes with escaped quotes inside it, it still "just works".

Writing a CSV parser in REGEX is ... well, it can work if you can assure the csv doesn't change, doesn't ever do anything weird inside it, and ... is trivial. But even then it's more work than letting Splunk do it.

Let us know what you end up doing!

Happy Splunking,

Rich

tkw03 · ‎10-06-2020

Hey Rich

Thanks for responding. I am a dummy and typed .csv when I mean to type .txt. The data is simple txt formatted. I should have realized my mistake in typing when I pasted the file example in.

I will definitely send the "----------------" to the NullQueue

I got the extraction to work in my regex tester BUT it doesnt seem to be extracting.

How can I fix my transforms .txt header extraction?

field extraction

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?