Re: Filed Extraction on Text File

garima_chauhan · ‎02-05-2014

Hi,

I have Host Firewall Logs coming in a text file. The data in the text file is separated by spaces and is inconsistent as for some rows there are say 8 columns, in some there are fewer and in some greater than 8 columns. I want to perform filed extraction on this data. How can this be achieved? I am familiar with csv field extraction but there the data is not inconsistent as is the case with this text file. I am using Splunk v5.0.5.

Please help. Its quite urgent. Any help would be really appreciated.

gfuente · ‎02-06-2014

There are missing characters, please see this update:

... | rex "^(?<field1>[^\s]+)?(\s)\*(?<field2>[^\s]+)?(\s)\*(?<field3>[^\s]+)?(\s)\*(?<field4>[^\s]+)?(\s)\*(?<field5>[^\s]+)?(\s)\*(?<field6>[^\s]+)?(\s)\*(?<field7>[^s]+)?(\s)\*(?<field8>[^s]+)?(\s)\*(?<field9>[^s]+)?(\s)\*(?<field10>[^s]+)?(\s)\*(?<field11>[^s]+)?(\s)\*(?<field12>[^s]+)?(\s)\*(?<field13>[^s]+)?(\s)\*(?<field14>[^s]+)?(\s)\*(?<field15>[^s]+)?(\s)\*" | ...

garima_chauhan · ‎02-06-2014

Hi,

Still didnt work..:(
I copied this exact regex.

gfuente · ‎02-05-2014

Hello

You could use a rex like this one:

^(?<field1>[^\s]+)?(\s)?(?<field2>[^\s]+)?(\s)?(?<field3>[^\s]+)?(\s)?(?<field4>[^\s]+)?(\s)?(?<field5>[^\s]+)?(\s)?(?<field6>[^\s]+)?(\s)?(?<field7>[^\s]+)?(\s)?(?<field8>[^\s]+)?(\s)?(?<field9>[^\s]+)?(\s)?(?<field10>[^\s]+)?(\s)?

Add as fields as the maximun number of fields you could have in the log file

Regards

garima_chauhan · ‎02-05-2014

Hi, I tried the following search:
source=FirewallLogs | rex "^(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)(?[^s]+)?(s)*" | table field1 field2

but, nothing gets displayed.

gfuente · ‎02-05-2014

Your sample lines have more than one space between some fields. Thats different from what you explained in your original question. try this:

| rex "^(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)(?[^\s]+)?(\s)*"

This works with the sample data you had provided

garima_chauhan · ‎02-05-2014

Hi gfuente,

My log file looks like:

7 123456 1.1.1.1 sfgdfghdghgdh 25 6 2.2.2.2 5255225 3.3.3.3 80 1 1 0 sdgzdfsg
7 456789 1.1.1.1 fsdfgsfgsfgfv 52 6 3.3.3.3 4654646 5.5.5.5 4564 2 2 2 pathoffile ssdfgsfg
7 123456 1.1.1.1 sfgdfghdghgdh 25 6 2.2.2.2 5255225 3.3.3.3 80 1 1 0 pathoffilevzfsgfgjsdlgjlsggflkgj sdgzdfsg

I am guessing that the column number discrepancy is due to the fact that if one column value is blank, it is not left blank and is instead populated with the next column value.

In any case, I do not how how to tackle this. Please help.

Filed Extraction on Text File

Index This | What’s a riddle wrapped in an enigma?

BORE at .conf25

OpenTelemetry for Legacy Apps? Yes, You Can!

Are you a member of the Splunk Community?

Filed Extraction on Text File

Index This | What’s a riddle wrapped in an enigma?

BORE at .conf25

OpenTelemetry for Legacy Apps? Yes, You Can!