How to add my regular expression to a search?

sujith0311 · ‎02-02-2017

Hi all,

I have a regular expression ^(.*)bytes read (?P\d+) written (?P\d+)$, where i edited the proper regular expression from a field to this to get output of particular users info to read the values of read and write. I have a search which will display the user session_id, file_name, time, ip. So, i just wanna add the regular expression to my search so that READ and WRITE values of a particular field of a user will be popping up.

My search is

index=sftp USER=gradydftsftpdata SESSION_ID=* | table USER, SESSION_ID,USER_IP,date_hour,_time | dedup SESSION_ID,USER_IP| join type=left max=2 SESSION_ID [search index=sftp SESSION_ID=* date_hour=* ACTION="open" OR ACTION="close" | table SESSION_ID, FILE_NAME, _time, USER_IP, ACTION] | table FILE_NAME,USER, SESSION_ID,USER_IP,date_hour,_time,ACTION | dedup FILE_NAME,ACTION

and my regular expression is ^(.*)bytes read (?P\d+) written (?P\d+)$.

Do we need to use join or transaction in this search??

DalJeanis · ‎02-02-2017

Nope. Basically, you need to look at your search and figure out where those words will exist in the underlying data, then use your regular expression to extract them into a named capture group.

Assuming that those words are appearing on the "open" and "close" events in the inside search, your code would look something like this -

 index=sftp USER=gradydftsftpdata SESSION_ID=* 
| table SESSION_ID, USER_IP,  USER, date_hour, _time 
| dedup SESSION_ID, USER_IP
| join type=left max=2 SESSION_ID 
    [search index=sftp SESSION_ID=* ACTION="open" OR ACTION="close" 
    | rex field=_raw "^(.*)bytes read (?P<BytesRead>\d+) written (?P<BytesWritten>\d+)$"
    | table SESSION_ID, FILE_NAME, ACTION, BytesRead, BytesWritten, _time 
    ] 
| table SESSION_ID, USER_IP, USER, FILE_NAME, ACTION,  BytesRead, BytesWritten, date_hour, _time
| sort FILE_NAME, ACTION, -_time
| dedup FILE_NAME, ACTION
| table FILE_NAME, USER, ACTION, date_hour, _time, SESSION_ID, USER_IP

Notes -

I've reordered the fields based on the logic involved, and then at the end presented them in the order you were outputing them. I find it's helpful to think hierarchically while building the extract. The fields to the farthest left are the "driver" fields, and the ones to the far right are "along for the ride".

You are not using USER_IP in the join, and it's already on the left records going into the join, so it doesn't need to be returned in the output table from the join. I've removed it from the output table, but if there is a reason - such as the session IDs not being unique without it - then you can put it back and add it as a join field after SESSION_ID outside the first bracket.

For your join code to be right, a session must only be uploading or downloading one file. I suspect that there may be an issue with the extract of the byte data, so start with this chunk , over a recent period of time, to test the extract -

    search index=sftp SESSION_ID=* ACTION="open" OR ACTION="close" | head 5
    | rex field=_raw "^(.*)bytes read (?P<BytesRead>\d+) written (?P<BytesWritten>\d+)$"
    | table SESSION_ID, USER_IP, FILE_NAME, ACTION, BytesRead, BytesWritten, _time

Your final dedup assumes that only one person and/or session will be uploading and/or downloading your file over any given time period. That doesn't seem right, but I've added a sort to retain the most RECENT upload or download, just in case it is correct for your purposes.

corrected typo

sujith0311 · ‎02-02-2017

I found this error Error in 'rex' command: The regex 'fields=_raw' does not extract anything. It should specify at least one named group. Format: (?...).. what does it exactly mean

richgalloway · ‎02-03-2017

The correct syntax is field=_raw (the keyword is singular). The default is to search _raw so the field keyword is not needed in this instance.

---
If this reply helps you, Karma would be appreciated.

DalJeanis · ‎02-03-2017

Hey, Rich, do you think the group parens around "^(.*)bytes" are causing any issues? Seems like it ought to be either "^(?:.*)bytes" or "^.*bytes".

DalJeanis · ‎02-03-2017

Certainly wasn't rields=_raw, the way i had it. Must have been Scooby-Doo that did my editing. 😉

In my demo code, I tend to be explicit with the defaults, since questioners are usually pretty confused by rex and regex, so showing them explicitly WHAT the rex command is comparing the regex against seems to be a good idea.

How to add my regular expression to a search?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

How to add my regular expression to a search?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers