Knowledge Management

How can I fix my transforms .txt header extraction?

tkw03
Communicator

Hello

I have data that comes in as .txt format. Its dropped into a folder that's monitored by Splunk. There is a current extraction we are using to pull the headers out of the data but there has been a new field added to the .txt and I need to create a new extraction for the headers. 

the data looks like this:

 

Type      AppliesTo  Path                             Snap  Hard    Soft  Adv     Used  Efficiency
---------------------------------------------------------------------------------------------------
directory DEFAULT    /ifs/common-place                 No    200.00G -     190.00G 0.00  0.00 : 1
directory DEFAULT    /ifs/data/capacity/T1000-CPReports No    100.00M -     99.00M  53.00 0.00 : 1
directory DEFAULT    /ifs/work/departments/T1000/Cognitus      No    10.00G  -     9.50G   348.27M 0.42 : 1
directory DEFAULT    /ifs/work/Projects/ref/staging          No    200.00G -     195.00G 3.72G   0.74 : 1
directory DEFAULT    /ifs/work/Projects/S4/ref/sapmnt        No    200.00G -     195.00G 1.69G   0.54 : 1
directory DEFAULT    /ifs/data/capacity/T1000-CPReports       No    100.00M -     99.00M  16.22k  0.13 : 1
---------------------------------------------------------------------------------------------------
Total: 6

 

the fields can either be populated with values OR if there's no value for the field it will use the literal dash 

 

-

 

It is possible for the "Path" field to contain a space.

The last column/field is called "Efficiency". For example this record:

 

directory DEFAULT    /ifs/data/capacity/T1000-CPReports       No    100.00M -     99.00M  16.22k  0.13 : 1

 

The Efficiency is "0.13 : 1"

For this record its "0.00 : 1"

 

directory DEFAULT    /ifs/data/capacity/T1000-CPReports No    100.00M -     99.00M  53.00 0.00 : 1

 

Can someone help me fix the extraction or if there's a better one let me know?

 

REGEX = ^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) *$

 

Thank you for the help

Labels (1)
0 Karma
1 Solution

tkw03
Communicator

I got an extraction that works:

^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) +(?<Efficiency>\d+\.\d+\s\:\s\d+) *$

 

Thanks

 

View solution in original post

0 Karma

tkw03
Communicator

I got an extraction that works:

^ *(?<Type>directory|file) +(?<AppliesTo>[^ ]+) +(?<Path>.+) +(?<Snap>[^ ]+) +(?<Hard>[^ ]+) +(?<Soft>[^ ]+) +(?<Adv>[^ ]+) +(?<Used>[^ ]+) +(?<Efficiency>\d+\.\d+\s\:\s\d+) *$

 

Thanks

 

0 Karma

Richfez
SplunkTrust
SplunkTrust

You could let Splunk sort out the header for you:

https://docs.splunk.com/Documentation/Splunk/8.0.6/Data/Extractfieldsfromfileswithstructureddata

INDEXED_EXTRACTIONS=csv may do this better and easier.

I'd also send events that match '-------' or whatever to nullQueue:

https://docs.splunk.com/Documentation/Splunk/8.0.6/Forwarding/Routeandfilterdatad#Discard_specific_e...

The advantages of letting Splunk do it is that it's got a full tested CSV parser.  If you get different fields tomorrow, it "just works".  If a field starts coming in wrapped in quotes with escaped quotes inside it, it still "just works". 

Writing a CSV parser in REGEX is ... well, it can work if you can assure the csv doesn't change, doesn't ever do anything weird inside it, and ... is trivial.  But even then it's more work than letting Splunk do it.

 

Let us know what you end up doing!

Happy Splunking,

Rich

0 Karma

tkw03
Communicator

Hey Rich


Thanks for responding. I am a dummy and typed .csv when I mean to type .txt. The data is simple txt formatted. I should have realized my mistake in typing when I pasted the file example in.

I will definitely send  the "----------------" to the NullQueue

I got the extraction to work in my regex tester BUT it doesnt seem to be extracting. 

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...