I have a regular expression
^(.*)bytes read (?P\d+) written (?P\d+)$, where i edited the proper regular expression from a field to this to get output of particular users info to read the values of read and write. I have a search which will display the user session_id, file_name, time, ip. So, i just wanna add the regular expression to my search so that READ and WRITE values of a particular field of a user will be popping up.
My search is
index=sftp USER=gradydftsftpdata SESSION_ID=* | table USER, SESSION_ID,USER_IP,date_hour,_time | dedup SESSION_ID,USER_IP| join type=left max=2 SESSION_ID [search index=sftp SESSION_ID=* date_hour=* ACTION="open" OR ACTION="close" | table SESSION_ID, FILE_NAME, _time, USER_IP, ACTION] | table FILE_NAME,USER, SESSION_ID,USER_IP,date_hour,_time,ACTION | dedup FILE_NAME,ACTION
and my regular expression is
^(.*)bytes read (?P\d+) written (?P\d+)$.
Do we need to use join or transaction in this search??
Nope. Basically, you need to look at your search and figure out where those words will exist in the underlying data, then use your regular expression to extract them into a named capture group.
Assuming that those words are appearing on the "open" and "close" events in the inside search, your code would look something like this -
index=sftp USER=gradydftsftpdata SESSION_ID=* | table SESSION_ID, USER_IP, USER, date_hour, _time | dedup SESSION_ID, USER_IP | join type=left max=2 SESSION_ID [search index=sftp SESSION_ID=* ACTION="open" OR ACTION="close" | rex field=_raw "^(.*)bytes read (?P<BytesRead>\d+) written (?P<BytesWritten>\d+)$" | table SESSION_ID, FILE_NAME, ACTION, BytesRead, BytesWritten, _time ] | table SESSION_ID, USER_IP, USER, FILE_NAME, ACTION, BytesRead, BytesWritten, date_hour, _time | sort FILE_NAME, ACTION, -_time | dedup FILE_NAME, ACTION | table FILE_NAME, USER, ACTION, date_hour, _time, SESSION_ID, USER_IP
I've reordered the fields based on the logic involved, and then at the end presented them in the order you were outputing them. I find it's helpful to think hierarchically while building the extract. The fields to the farthest left are the "driver" fields, and the ones to the far right are "along for the ride".
You are not using USER_IP in the join, and it's already on the left records going into the join, so it doesn't need to be returned in the output table from the join. I've removed it from the output table, but if there is a reason - such as the session IDs not being unique without it - then you can put it back and add it as a join field after SESSION_ID outside the first bracket.
For your join code to be right, a session must only be uploading or downloading one file. I suspect that there may be an issue with the extract of the byte data, so start with this chunk , over a recent period of time, to test the extract -
search index=sftp SESSION_ID=* ACTION="open" OR ACTION="close" | head 5 | rex field=_raw "^(.*)bytes read (?P<BytesRead>\d+) written (?P<BytesWritten>\d+)$" | table SESSION_ID, USER_IP, FILE_NAME, ACTION, BytesRead, BytesWritten, _time
Your final dedup assumes that only one person and/or session will be uploading and/or downloading your file over any given time period. That doesn't seem right, but I've added a sort to retain the most RECENT upload or download, just in case it is correct for your purposes.
The correct syntax is
field=_raw (the keyword is singular). The default is to search _raw so the field keyword is not needed in this instance.
Certainly wasn't rields=_raw, the way i had it. Must have been Scooby-Doo that did my editing. 😉
In my demo code, I tend to be explicit with the defaults, since questioners are usually pretty confused by rex and regex, so showing them explicitly WHAT the rex command is comparing the regex against seems to be a good idea.