Splunk Search

How to add my regular expression to a search?

sujith0311
New Member

Hi all,

I have a regular expression ^(.*)bytes read (?P\d+) written (?P\d+)$, where i edited the proper regular expression from a field to this to get output of particular users info to read the values of read and write. I have a search which will display the user session_id, file_name, time, ip. So, i just wanna add the regular expression to my search so that READ and WRITE values of a particular field of a user will be popping up.

My search is

index=sftp USER=gradydftsftpdata SESSION_ID=* | table USER, SESSION_ID,USER_IP,date_hour,_time | dedup SESSION_ID,USER_IP| join type=left max=2 SESSION_ID [search index=sftp SESSION_ID=* date_hour=* ACTION="open" OR ACTION="close" | table SESSION_ID, FILE_NAME, _time, USER_IP, ACTION] | table FILE_NAME,USER, SESSION_ID,USER_IP,date_hour,_time,ACTION | dedup FILE_NAME,ACTION

and my regular expression is ^(.*)bytes read (?P\d+) written (?P\d+)$.

Do we need to use join or transaction in this search??

0 Karma

DalJeanis
Legend

Nope. Basically, you need to look at your search and figure out where those words will exist in the underlying data, then use your regular expression to extract them into a named capture group.

Assuming that those words are appearing on the "open" and "close" events in the inside search, your code would look something like this -

 index=sftp USER=gradydftsftpdata SESSION_ID=* 
| table SESSION_ID, USER_IP,  USER, date_hour, _time 
| dedup SESSION_ID, USER_IP
| join type=left max=2 SESSION_ID 
    [search index=sftp SESSION_ID=* ACTION="open" OR ACTION="close" 
    | rex field=_raw "^(.*)bytes read (?P<BytesRead>\d+) written (?P<BytesWritten>\d+)$"
    | table SESSION_ID, FILE_NAME, ACTION, BytesRead, BytesWritten, _time 
    ] 
| table SESSION_ID, USER_IP, USER, FILE_NAME, ACTION,  BytesRead, BytesWritten, date_hour, _time
| sort FILE_NAME, ACTION, -_time
| dedup FILE_NAME, ACTION
| table FILE_NAME, USER, ACTION, date_hour, _time, SESSION_ID, USER_IP 

Notes -

I've reordered the fields based on the logic involved, and then at the end presented them in the order you were outputing them. I find it's helpful to think hierarchically while building the extract. The fields to the farthest left are the "driver" fields, and the ones to the far right are "along for the ride".

You are not using USER_IP in the join, and it's already on the left records going into the join, so it doesn't need to be returned in the output table from the join. I've removed it from the output table, but if there is a reason - such as the session IDs not being unique without it - then you can put it back and add it as a join field after SESSION_ID outside the first bracket.

For your join code to be right, a session must only be uploading or downloading one file. I suspect that there may be an issue with the extract of the byte data, so start with this chunk , over a recent period of time, to test the extract -

    search index=sftp SESSION_ID=* ACTION="open" OR ACTION="close" | head 5
    | rex field=_raw "^(.*)bytes read (?P<BytesRead>\d+) written (?P<BytesWritten>\d+)$"
    | table SESSION_ID, USER_IP, FILE_NAME, ACTION, BytesRead, BytesWritten, _time 

Your final dedup assumes that only one person and/or session will be uploading and/or downloading your file over any given time period. That doesn't seem right, but I've added a sort to retain the most RECENT upload or download, just in case it is correct for your purposes.


corrected typo

0 Karma

sujith0311
New Member

I found this error Error in 'rex' command: The regex 'fields=_raw' does not extract anything. It should specify at least one named group. Format: (?...).. what does it exactly mean

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The correct syntax is field=_raw (the keyword is singular). The default is to search _raw so the field keyword is not needed in this instance.

---
If this reply helps you, Karma would be appreciated.
0 Karma

DalJeanis
Legend

Hey, Rich, do you think the group parens around "^(.*)bytes" are causing any issues? Seems like it ought to be either "^(?:.*)bytes" or "^.*bytes".

0 Karma

DalJeanis
Legend

Certainly wasn't rields=_raw, the way i had it. Must have been Scooby-Doo that did my editing. 😉

In my demo code, I tend to be explicit with the defaults, since questioners are usually pretty confused by rex and regex, so showing them explicitly WHAT the rex command is comparing the regex against seems to be a good idea.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...