Splunk Search

How to do field extraction and event exclusion?

zacksoft
Contributor

Need help with field extractions. Need to extract the fields in bold.
Here are two sample events

Sample1
40.156.209.1 | ssh | o*4RAGZLx404x22840423x1 | JG25721 | 2018-06-20 06:44:51,219 | SSH - git-upload-pack '/dga/dgiodbatc.git' | - | 0 | 4 | 1911 | cache:miss, refs, ssh:user:id:126642 | 2140 | 1hgs9dp |

Sample2
10.348.20.158,30.158.219.1 | https | i*1N0FIQQx408x22719240x2 | - | 2018-06-20 06:48:08,653 | "GET /rest/api/1.0/repos HTTP/1.1" | "" "Apache-HttpClient/4.5.3 (Java/1.8.0_77)" | - | - | - | - | - | - |

Post extraction of the first field , check if that extracted field starts with "o" then extract the second bold field (i.e. 2140) and if the extracted first field starts with "i" then ignore that event.

0 Karma
1 Solution

knielsen
Contributor

Maybe something like this as field extraction

^(?<ips>\S+)\s\|\s(?<protocol>\S+)\s\|\s(?<id>\S+)\s\|\s([^\|]+\|\s){8}(?<id_no>\S+)

the "*" makes it a little cumbersome, but this should work: base search | where NOT like(id,"i%")

Personally, I'd just extract all the fields btw and not use ([^\|]+\|\s){8} to skip to the number later on, but if you don't need the other fields, well...

Hth,
-Kai.

View solution in original post

0 Karma

knielsen
Contributor

Maybe something like this as field extraction

^(?<ips>\S+)\s\|\s(?<protocol>\S+)\s\|\s(?<id>\S+)\s\|\s([^\|]+\|\s){8}(?<id_no>\S+)

the "*" makes it a little cumbersome, but this should work: base search | where NOT like(id,"i%")

Personally, I'd just extract all the fields btw and not use ([^\|]+\|\s){8} to skip to the number later on, but if you don't need the other fields, well...

Hth,
-Kai.

0 Karma

zacksoft
Contributor

Could you help me form it in a query
This is how I am composing

sourcetype="Raccess" (host="AVOP" OR host="BVOP") date_wday!=saturday AND date_wday !=sunday
| rex "^(?\S+)\s|\s(?\S+)\s|\s(?\S+)\s|\s([^|]+|\s){8}(?\S+)"
| where NOT like(id,"i%")
| timechart values(id_no)

This doesn't give me any result.
Yes, extracting all the fields would also help me a great deal... But we just gotta make sure only to extract the fields from the events if the third field of the event starts with an 'o' Not 'i'.

0 Karma

knielsen
Contributor

Would you mind putting your code into code blocks? 🙂

Well, it wasn't meant to be used as rex command, I thought of field extraction on the sourcetype in question, and then doing a search with that. That being said, in my test it works with rex.

If you insist on not extracting the field on i* (I just discarded those events with the NOT like() clause), you could do that directly in rex as well, eg

rex field=input "^(?<ips>\S+)\s\|\s(?<protocol>\S+)\s\|\s(?<id>i\S+)\s\|\s([^\|]+\|\s){8}(?<id_no>\S+)"

will only extract when id starts with "i", and then you can lose the "where NOT".

At least this works when I pipe your example through, like

| makeresults | eval input="40.156.209.1 | ssh | i*4RAGZLx404x22840423x1 | JG25721 | 2018-06-20 06:44:51,219 | SSH - git-upload-pack '/dga/dgiodbatc.git' | - | 0 | 4 | 1911 | cache:miss, refs, ssh:user:id:126642 | 2140 | 1hgs9dp |" | rex field=input "^(?<ips>\S+)\s\|\s(?<protocol>\S+)\s\|\s(?<id>i\S+)\s\|\s([^\|]+\|\s){8}(?<id_no>\S+)" | stats values(id_no)
  • if I change the input on that to id=o*...., you don't get anything, but you get 2140 as is.
0 Karma

zacksoft
Contributor

I tried the following ,

sourcetype="Raccess" (host="AVOP" OR host="BVOP") date_wday!=saturday AND date_wday !=sunday
 makeresults | eval input=_raw | rex field=_raw "^(?<ips>\S+)\s\|\s(?<protocol>\S+)\s\|\s(?<id>i\S+)\s\|\s([^\|]+\|\s){8}(?<id_no>\S+)" | stats values(id_no)

It gives errors saying Error in 'makeresults' command: This command must be the first command of a search.

0 Karma

knielsen
Contributor

"makeresults" is what is being used a lot here to generate artificial result sets, since people don't have the same raw data as other people.

So you could cut and paste my last answer without any additional base search to play around with it. Sorry, I took that for granted.

So what about

sourcetype="Raccess" (host="AVOP" OR host="BVOP") date_wday!=saturday AND date_wday !=sunday | rex field=_raw "^(?<ips>\S+)\s\|\s(?<protocol>\S+)\s\|\s(?<id>i\S+)\s\|\s([^\|]+\|\s){8}(?<id_no>\S+)" | stats values(id_no)

Doesn't that work? If not, then your raw data doesn't probably exactly match what you posted here, or I may be misunderstanding something. It happens, I am so used to using Splunk in a certain way with certain data sets and questions, that I automatically misunderstand in my own way. 🙂

0 Karma

zacksoft
Contributor

Thank you @knielsen

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...