Splunk Search

Regex to extract fields from CSV file based on number of delimiters?

nancylawrence00
Explorer

Hi,

I have two types of logs:

Log1:

Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;

Log2:

Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;DATABASE01;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;

If i use

^([^;]*;){10}(?<Field_Name>[^;]*) 

based on the number of delimiters ;, I get wrong output from Log2 as there is an Extra field (DATABASE01)

I need to count the number of delimiters and then extract the field based on the count.

Please advice.

Thanks
Nancy

gcato
Contributor

Not exactly sure what you want, but maybe splitting the events into multivalue fields and then using multivalue eval commands will get the results you need. If will certainly give you a count of the number of fields.

Anyway, here are some examples of how to use eval's mv commands to extract fields and will hopefully get you going (ignore the first half which is just creating the test output).

|stats count as logs | eval logs="Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;|Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;DATABASE01;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;" 
| eval logs=split(logs, "|") |mvexpand logs
| eval fields = split(logs, ";") 
| eval n1 = mvfind(fields, "LEC.*") | eval extract_field1 = mvindex(fields, n1)
| eval n2 = mvfind(fields, "CONNECT") | eval extract_field2 = mvindex(fields, n1,n2)
| eval n3 = mvfind(fields, "DATABASE\d+") | eval n3=if(isnull(n3), 5, n3) | eval extract_field3 = mvindex(fields, n3+1)
| eval c = mvcount(fields)
| eval extract_field4 = case (c==31, mvindex(fields, c-3), c==32, mvindex(fields, c-3))
| ... <etc> ...

Also refer to the eval comand in the docs for further reference and even more mv commands that you can use to hopefully get the results you want. http://docs.splunk.com/Documentation/Splunk/6.2.5/SearchReference/CommonEvalFunctions#Multivalue_fun...

nancylawrence00
Explorer

Hi,

Hi,

I can extract all the fields by:

^([^;];){1}(?[^;])
^([^;];){2}(?[^;])
^([^;];){3}(?[^;]) and so on......

the issue is that i have an extra field in middle of the log and the above regex extracts the wrong field once it gets to the extra field in some of my logs.......

For further info i here are the logs....

Log1:
Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;

Log2:
Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;DATABASE01;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;

You see there is an extra field in Log2 (Database1), so when i use the regex mentioned above the numbering changes and thus pulls up the wrong field.

0 Karma

gcato
Contributor

Hi nancylawrence007,

I understand the issue but I'm still not sure what you exactly want from the examples provided. As far as I know regex cannot count delimiters, so that is not an option. Basically, you need to normalise the logs so they always have the same number of fields, for which the sed command is perfect. So either,

  1. add an extra ";" between "field6;field_7" in log1 (32 fields), or
  2. substitute the ";" before DATABASE01 field in log2 (31 fields), or
  3. remove the ";DATABASE01" section (31 fields)

I'd recommend #1, as #2 means you would need to perform another regex on the substituted field to extract its two values, and #3 removes a field (though it could be extracted earlier). Anyway, the "rex mode=sed ..." run anywhere example below would do the trick (assuming log1's 6 & 7 fields are consistently a "number;ip_addr") for #1 option.

 |stats count as logs | eval logs="Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;|Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;DATABASE01;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;"
 | eval logs=split(logs, "|") |mvexpand logs
 | rex mode=sed field=logs "s/((?:[^;]+;){5}\d+;)(\d+\.\d+\.\d+\.\d+;.*)/\1;\2/"
 | eval log_fields=split(logs, ";") |eval field_count = mvcount(log_fields)

Now, both logs have the same number of fields (as the field_count shows) and whatever regex extraction you are using will work.

richgalloway
SplunkTrust
SplunkTrust

Each log should be a separate sourcetype with appropriate extractions created for them.

---
If this reply helps you, Karma would be appreciated.
0 Karma

nancylawrence00
Explorer

i am afraid we cannot do so.....i need a regex to count the delimiters first and then based on the condition of the number of delimiters extract the fields accordingly.

0 Karma

MuS
Legend

Sorry to say, but you're contradicting yourself with your comments. Once you need to count the ; and extract fields based on the count, in the next comment you want all fields?!?! It's pretty hard to help this way and since most of use are doing this voluntarily, it would be nice to provide precise information....other wise it's like asking the magic 8-ball

0 Karma

nancylawrence00
Explorer

Hi,

I have to extract all the fields by:

^([^;];){1}(?[^;])
^([^;];){2}(?[^;])
^([^;];){3}(?[^;]) and so on......

the issue is that i have an extra field in middle of the log and the above regex extracts the wrong field once it gets to the extra field in some of my logs.......

For further info i here are the logs....

Log1:
Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;

Log2:
Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;DATABASE01;11.91.6.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;

You see there is an extra field in Log2 (Database1), so when i use the regex mentioned above the numbering changes and thus pulls up the wrong field.

0 Karma

nancylawrence00
Explorer

Logs above are just examples, exact logs differ but are separated by delimiters

0 Karma

nancylawrence00
Explorer

i need to extract all the fields

0 Karma

nancylawrence00
Explorer

Jun 18 14:10:57 lec05674568 ABC[455135]: 2015-06-18 14:10:57;indexserver;lec056741;XBE;06;30893;10.81.5.128;lec056287312;9580;9599;LEC Audit - SYSTEM Logins;CRITICAL;CONNECT;SYSTEM;;;;;;DAPNAPS;UNSUCCESSFUL;;;;;;invalid username or password;;5562789;;

0 Karma

MuS
Legend

thanks 🙂 in this example, where is the needed time ?

0 Karma

MuS
Legend

sorry to ask again, but which is the value you want to get for both examples?
Your regex captures from Log1:

9599;LEC Audit - SYSTEM Logins

and from Log2:

9580;9599

If you don't provide this information there is no way to help you with this problem.

0 Karma

MuS
Legend

either provide the exact log events or try the second regex mentioned by @gcato

0 Karma

MuS
Legend

Hi nancylawrence007,

Or you start from the end of the event, if the end is always ;VPN; you can use this regex:

;(?<Event_TimeStamp>\d+:\d+);VPN;

cheers, MuS

gcato
Contributor

Nice answer MuS.

Just thinking that if the event ending is different, then you may also be able to just extract the time with something like this (if no other fields match the time pattern) .

... |eval Event_TimeStamp=_raw |rex mode=sed field=Event_TimeStamp "s/([^;]+;)+(\d\d:\d\d)(;.*$)/\2/"
0 Karma

gcato
Contributor

Though, thinking about it again, if the field time pattern is unique across all fields, then you do not need to worry about the number of delimiters (or sed) and the following would work too

 rex "([^;]+;)+(?\d\d:\d\d);"
0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...