I am parsing SFTP logs of file downloads and want to count how many bytes a specific user downloaded at what time. The logs look like this (I am abbreviating the standard rfc5424 syslog prefix):
session opened for local user XXX from [10.#.#.#]
received client version #
open "/some/file/name" flags READ mode 0666
close "/some/file/name" bytes read ### written #
open "/some/other/file/name" flags READ mode 0666
close "/some/other/file/name" bytes read ### written #
open "/and/another/filename" flags READ mode 0666
close "/and/another/filename" bytes read ### written #
session closed for local user XXX from [10.#.#.#]
I want to somehow show how many bytes a specific user downloaded at what time.
I start by inline extraction of a few extra helper fields like the username and the file sizes for example:
appname=sftp-server
| rex field=_raw "session (opened|closed) for local user (?<sftp_user>[^ ]+) from"
| rex field=_raw "close \"(?<sftp_filename>.*)\" bytes read (?<sftp_bytes_read>\d+)"
If I wanted to see how much data was downloaded (without caring about which user) I would just do a timechart which does the trick:
appname=sftp-server
| rex field=_raw "session (opened|closed) for local user (?<sftp_user>[^ ]+) from"
| rex field=_raw "close \"(?<sftp_filename>.*)\" bytes read (?<sftp_bytes_read>\d+)"
| timechart sum(sftp_bytes_read)
However the event which has the file size, does not have the user so I can filter or chart the username.
If I want to filter by sftp_user, the only way I found how to do it is by making a transaction for the user session and then filtering on the sftp_user (in the example below, host, appname, and procid are extracted by the rfc5424 syslog addon):
appname=sftp-server
| rex field=_raw "session (opened|closed) for local user (?<sftp_user>[^ ]+) from"
| rex field=_raw "close \"(?<sftp_filename>.*)\" bytes read (?<sftp_bytes_read>\d+)"
| transaction host appname procid sftp_user startswith="session opened for" endswith="session closed for"
This does give me an events for every single SFTP session by the user, but I cannot figure out how to get the details of each file download (the individual "close" lines) out of it. What would be the way to do that, or just... "explode" it back into individual events?
Assuming you have already extracted _time
appname=sftp-server
| rex field=_raw "sftp-server (?<sftp_server>[^ ]+) "
| rex field=_raw "session (opened|closed) for local user (?<sftp_user>[^ ]+) from"
| rex field=_raw "close \"(?<sftp_filename>.*)\" bytes read (?<sftp_bytes_read>\d+)"
| sort 0 sftp_server _time
| filldown sftp_user
| sort 0 _time
| timechart sum(sftp_bytes_read) by sftp_user
Not sure if the second sort is required as timechart may handle it without
Try something like this
appname=sftp-server
| rex field=_raw "session (opened|closed) for local user (?<sftp_user>[^ ]+) from"
| rex field=_raw "close \"(?<sftp_filename>.*)\" bytes read (?<sftp_bytes_read>\d+)"
| filldown sftp_user
| timechart sum(sftp_bytes_read) by sftp_user
Ah, thanks. It looks like filldown will do the trick most of the time.
But if I am right in my understanding, as soon as I get another connection in the middle of a transaction, I'd get the wrong user in the latter events? E.g., in a situation like the below, I have a long running session for userA, and if I get a connection for userB, all the latter downloads by userA will be assigned to userB.
I was trying to use a transaction to group the transaction based on the hostname and PID (procid) but then ended up asking how to get the individual lines out of it.
<38>1 2021-09-16T09:36:51.384489+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] session opened for local user userA from [#.#.#.#]
<38>1 2021-09-16T09:36:51.385571+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] received client version 4
<38>1 2021-09-16T09:36:51.386537+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] realpath "."
<38>1 2021-09-16T09:36:51.387378+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] stat name "....."
<38>1 2021-09-16T09:36:51.389034+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] opendir "....."
<38>1 2021-09-16T09:37:01.384311+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] open "....." flags READ mode 0666
<38>1 2021-09-16T09:37:01.385507+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] close "....." bytes read 0 written 0
<38>1 2021-09-16T09:37:01.386448+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] open "....." flags READ mode 0666
<38>1 2021-09-16T09:37:01.387447+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] close "....." bytes read 1220821 written 0
<38>1 2021-09-16T09:37:01.388243+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] open "....." flags READ mode 0666
<38>1 2021-09-16T09:37:11.384320+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] close "....." bytes read 1220274 written 0
<38>1 2021-09-16T09:37:11.385360+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] open "....." flags READ mode 0666
<38>1 2021-09-16T09:37:11.386432+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] close "....." bytes read 1218135 written 0
<38>1 2021-09-16T09:37:11.387112+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] open "....." flags READ mode 0666
<38>1 2021-09-16T09:37:11.387754+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] close "....." bytes read 1216635 written 0
<38>1 2021-09-16T09:37:13.519659+09:00 hostname sftp-server 11797 - [origin ip="#.#.#.#"] session opened for local user userB from [#.#.#.#]
<38>1 2021-09-16T09:37:13.667708+09:00 hostname sftp-server 11797 - [origin ip="#.#.#.#"] received client version 3
<38>1 2021-09-16T09:37:13.829500+09:00 hostname sftp-server 11797 - [origin ip="#.#.#.#"] realpath "."
<38>1 2021-09-16T09:37:13.991276+09:00 hostname sftp-server 11797 - [origin ip="#.#.#.#"] session closed for local user userB from [#.#.#.#]
<38>1 2021-09-16T09:37:21.383055+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] open "....." flags READ mode 0666
<38>1 2021-09-16T09:37:21.384737+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] close "....." bytes read 1219862 written 0
<38>1 2021-09-16T09:37:21.385609+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] open "....." flags READ mode 0666
<38>1 2021-09-16T09:37:21.385642+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] close "....." bytes read 1218359 written 0
<38>1 2021-09-16T09:37:41.384710+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] open "....." flags READ mode 0666
<38>1 2021-09-16T09:37:41.385632+09:00 hostname sftp-server 20715 - [origin ip="#.#.#.#"] close "....." bytes read 1220359 written 0
Assuming you have already extracted _time
appname=sftp-server
| rex field=_raw "sftp-server (?<sftp_server>[^ ]+) "
| rex field=_raw "session (opened|closed) for local user (?<sftp_user>[^ ]+) from"
| rex field=_raw "close \"(?<sftp_filename>.*)\" bytes read (?<sftp_bytes_read>\d+)"
| sort 0 sftp_server _time
| filldown sftp_user
| sort 0 _time
| timechart sum(sftp_bytes_read) by sftp_user
Not sure if the second sort is required as timechart may handle it without
Spot on!
Thanks, the "sort" is the part I was missing to put the transactions next to each other.
Just for transparency, those prefix lines are standard rfc5424 syslog fields, so a Splunk addon I have is already extracting them for me so I have _time, host, appname, procid already extracted. The final search to do the trick looks like this (minimizing the changes to demonstrate what was important here based on your input).
appname=sftp-server
| rex field=_raw "session (opened|closed) for local user (?<sftp_user>[^ ]+) from"
| rex field=_raw "close \".*\" bytes read (?<sftp_bytes_read>\d+)"
| sort 0 host appname procid _time
| filldown sftp_user
| timechart sum(sftp_bytes_read) by sftp_user