Splunk Search

How to extract just one field from a log when there are multiple that carry the same attribute name? - Regular Expression

evang_26
Communicator

Hi all,

I am filtering some logs came from Nessus in order to identify vulnerable machines based on their OS, and the issue I have is when a host's OS is not adequately identified resulting in many "os" fields. An example is the below:

start_time="Mon Feb 16 03:56:07 2015"
end_time="Mon Feb 16 03:57:42 2015"
os="Microsoft Windows 2000" os="Microsoft Windows XP for Embedded Systems" os="Microsoft
Windows XP"

The query that I created for that (which only works sufficiently when 1 OS is found) is the following:

sourcetype=nessus severity!=informational  | rex "start_time=\"(?<start>.*)\"\send.*\s\sos=\"(?<OS>.*)\""

What I would like ideally to do, is to just find a way to filter out the " (double quote" symbol from within the extracted field. This is because apart from Windows machines, there are other printers and access points that are interpreted as many other mixed OSs.

So, it should be something like this:

sourcetype=nessus severity!=informational  | rex "start_time=\"(?<start>.*)\"\send.*\s\sos=\"(?<OS>.*[^\"])\""

but it doesn't work.

Any thoughts?

Regards,
Evang

Update:

Hi MuS,

Okay, here we are.

I guess I haven't stated my problem correctly. I do not want to remove the double quotes, actually, I want to only keep the first occurence of OS field in the rare cases that more than one appears!

Here is what I managed to do with sed, but I am not there quite yet.

sourcetype=nessus severity!=informational earliest=-5w@w1 latest=now|rex field=os mode=sed "s/.*\(os=\"[^\"]*\"\).*$/\1/g" | rex ".*os=\"(?P<OS>.*)\"\s.*\""

Any suggestions? 🙂

Regards,
Evang

0 Karma
1 Solution

MuS
Legend

Hi evang_26,

Okay after reading the update, it makes sense, try this:

sourcetype=nessus severity!=informational  | rex max_match=1 "start_time=\"(?<start>.*)\"\send.*\s\sos=\"(?<OS>[\w+\s]+)\"

This will match only one occurrence for each field in the regex

Hope this helps ...

cheers, MuS

View solution in original post

ramdaspr
Contributor

Can you add an example of how you would like the output to look like?
The solution provided by MuS would give you the output without any double quotes, but its not really clear if thats what your intention is or if you would like to create a multivalue field with all the different OS's added.

0 Karma

evang_26
Communicator

Hi ramdaspr,

To be honest, as I stated on MuS's answer, my problem wasn't sufficiently clarified. I want to keep just the first occurrence of OS field in case there are more than one. This way, the dashboard want look overwhelmed by huge tags, will look prettier and in fact, Nessus orders OS guesses based on probability.

Look below MuS's answer my comment and my attempt.

Regards,
Evang

0 Karma

MuS
Legend

Hi evang_26,

Okay after reading the update, it makes sense, try this:

sourcetype=nessus severity!=informational  | rex max_match=1 "start_time=\"(?<start>.*)\"\send.*\s\sos=\"(?<OS>[\w+\s]+)\"

This will match only one occurrence for each field in the regex

Hope this helps ...

cheers, MuS

evang_26
Communicator

Hi MuS,

Okay, here we are.

I guess I haven't stated my problem correctly. I do not want to remove the double quotes, actually, I want to only keep the first occurence of OS field in the rare cases that more than one appears!

Here is what I managed to do with sed, but I am not there quite yet.

sourcetype=nessus severity!=informational earliest=-5w@w1 latest=now|rex field=os mode=sed "s/.*\(os=\"[^\"]*\"\).*$/\1/g" | rex ".*os=\"(?P<OS>.*)\"\s.*\""

Any suggestions? 🙂

Regards,
Evang

0 Karma

MuS
Legend

Okay, now it makes sense, try this:

sourcetype=nessus severity!=informational  | rex max_match=1 "start_time=\"(?<start>.*)\"\send.*\s\sos=\"(?<OS>[\w+\s]+)\"

This will match only one occurrence for each field in the regex

evang_26
Communicator

Hi MuS,

It worked perfectly!

Thanks you very much!

Regards,
Evang

0 Karma

MuS
Legend

you're welcome, I've updated your question and my answer so it makes sense 😉

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You were so close. Try this.

... | rex "start_time=\"(?P<start>.*)\"\send.*\sos=\"(?P<OS>[^\"]*?)\""
---
If this reply helps you, Karma would be appreciated.
0 Karma

evang_26
Communicator

Hi richgallowway,

I already tried this, placing the [] in front and on the end, no luck.

Regards,
Evang

0 Karma
Get Updates on the Splunk Community!

Notification Email Migration Announcement

The Notification Team is migrating our email service provider from Postmark to AWS Simple Email Service (SES) ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...