お世話になります。
現在、あるログの集計をしております。
接続元IPアドレスと、接続日時をキーにして、初回接続日から10日間経過後も接続しているログのみを抽出出来るようにしたいですが、上手く抽出することが出来ません。
※合計接続日数は初回接続日~最終接続日の間で接続された日数をカウントした数です。
このようなデータを抽出するサーチ文をご教授いただけると幸いです。
■サンプルデータ
接続IPアドレス | 接続日 |
1.0.0.0 | 2023-01-01 10:35:45 |
1.0.0.0 | 2023-01-03 12:33:10 |
1.0.0.0 | 2023-01-08 09:35:06 |
1.0.0.0 | 2023-01-11 21:18:29 |
2.0.0.0 | 2023-01-01 23:32:11 |
2.0.0.0 | 2023-01-05 04:55:15 |
2.0.0.0 | 2023-01-10 19:35:24 |
■出力結果イメージ
接続IPアドレス | 初回接続日時 | 最終接続日時 | 接続日数 |
1.0.0.0 | 2023-01-01 10:35:45 | 2023-01-11 21:18:29 | 4 |
To do this properly you would need to do this at index time so that proper event breaking would occur. If you are trying to do event breaking at search time this becomes much harder. If this data was properly event broken, each event would have the correct time assigned to it. this is the best practice. However I know that is not always within your control.
If the above is not possible, then I would start with a rex command with the max_match=0 parameter to capture each pattern repeatedly. it might look like:
| rex max_match=0 field=_raw (or whatever the field is named) ".*(<ipAddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?<connectionTime>.*)(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
This would create 2 new fields per event, one with ALLthe ip address and one with ALL the time field values. so seperate the multivalue fields, you can use mvexpand, it will create a duplicate event for each value.
once you captured a time field, you might then need to use strftime to convert it into Epoch time so that splunk can put it in time order for you.
eval _time= strptime(connectionTime, "%Y-%m-%d %H:%M:%S.%N")
As you can see, the method to fix this at search time is very complicated. this search is poorly optimized and they are many points of failure.
The best way to proceed, especially if you want to use this data long term, is to set up proper event breaking. If your Splunk teams needs any help with the event breaking, I'd be happy to walk them through it.
Thank you for your advice.
I was able to get the expected results.
Perhaps this will help. The Connection Date must be converted into epoch (integer) form so we can compute the difference between the first and last connection.
<<your search for Connection IP Address and Connection Date>>
| eval connectionEpoch = strptime('Connection Date', "<<format of Connetion Date>>")
| stats earliest("Connection Date") as "First Connection Date",
latest("Connection Date") as "Last Connection Date",
range("connectionEpoch") as connectionTime by "Connection IP Address"
| eval "Number of days connected" = connectionTime/86400
| table "Connection IP Address" "First Connection Date", "Last Connection Date" "Number of days connected"