I've piped a Splunk log query extract into a table showing disconnected and connected log entries sorted by time.
NB row 1 is fine. Row 2 is fine because it connected within 120 sec.
Now I want to show "disconnected" entries with no subsequent "connected" row say within a 120 sec time frame. So, I want to pick up rows 4 and 5.
Can someone advise on the Splunk query format for this?
Table = Connect_Log
Row | Time | Log text |
1 | 7:00:00am | connected |
2 | 7:30:50am | disconnected |
3 | 7:31:30am | connected |
4 | 8:00:10am | disconnected |
5 | 8:10:30am | disconnected |
You can also do it with streamstats with the last two lines of this example - note the field name Log_text, with the _ in the middle, as the reset_after statement doesn't like spaces in the field name.
| makeresults format=csv data="Row,Time,Log_text
1,7:00:00am,connected
2,7:30:50am,disconnected
3,7:31:30am,connected
4,8:00:10am,disconnected
5,8:10:30am,disconnected"
| eval _time=strptime(Time, "%H:%M:%S")
| sort - _time
| streamstats time_window=120s reset_after="("Log_text=\"disconnected\"")" count
| where count=1 AND Log_text="disconnected"
The overall idea is ok but if you want to check if something happens _after_ an interesting event you must reverse the original data stream because you cannot streamstats backwards. But the example data was in chronological order while the defaul result sorting is opposite. So it's all a bit confusing.
Yes, this one works because the connected AFTER the disconnected does not happen, resulting in the count=1 for a disconnect - normally you'd get them in reverse and in this case, that would be the order needed. It rather trivialises the example, but without knowing the data, it's hard to know if it would work in all cases.
Yes. I'm just pointing it out because it's a common use case - to find something that is (not) followed by another something and it's a bit unintuitive that Splunk by default returns results in reverse chronological order. So you sometimes need to either manipulate the order of results so that "previous" in terms of carrying over values further down the stream means the desired way. Or you have to remember that you're returning the end of some interesting period, not its beginning.
As I said - depending on the use case it can be sometimes confusing and it's worth remembering to always double check your results order when you're doing similar things.
This is one of few occasions that transaction command is appropriate. Something like
| rename "Log text" as LogText
| transaction maxspan=120s startswith="LogText = disconnected" endswith="LogText = connected" keeporphans=true
| where isnull(closed_txn)
Your mock data would give
LogText | Row | _time | closed_txn | duration | eventcount | field_match_sum | linecount |
disconnected | 5 | 2024-12-17 08:10:30 | |||||
disconnected | 4 | 2024-12-17 08:00:10 |
Here is an emulation of your mock data.
| makeresults format=csv data="Row, _time, Log text
1, 7:00:00am, connected
2, 7:30:50am, disconnected
3, 7:31:30am, connected
4, 8:00:10am, disconnected
5, 8:10:30am, disconnected"
| eval _time = strptime(_time, "%I:%M:%S%p")
| sort - _time
``` data emulation above ```
Play with the emulation and compare with real data.
Thanks for response. I will try it out.
My bad - The LogText column has the key word (connected or disconnected) with other texts. It will some kind of wildcard lookup for either of these 2 words.
So, I'm looking to extract row 4 and 5 which has the "disconnected" text and where there isn't an associated connected row within say 120 secs.
Row | Time | LogText |
1 | 7:00:00am | text connected text |
2 | 7:30:50am | text disconnected text |
3 | 7:31:30am | text connected text |
4 | 8:00:10am | text disconnected text |
5 | 8:10:30am | text disconnected text |
Sure. startswith and endwith can also be sophisticated.
| rename "Log text" as LogText
| transaction maxspan=120s startswith=eval(match(LogText, "\bdisconnected\b")) endswith=eval(match(LogText, "\bconnected\b")) keeporphans=true
| where isnull(closed_txn)
Here is an emulation
| makeresults format=csv data="Row, _time, Log text
1, 7:00:00am, text connected\ntext
2, 7:30:50am, text\ndisconnected\n\ntext
3, 7:31:30am, text connected\ntext
4, 8:00:10am, text\ndisconnected\n\ntext
5, 8:10:30am, text\ndisconnected\n\ntext"
| eval _time = strptime(_time, "%I:%M:%S%p"), "Log text" = replace('Log text', "\\\n", "
")
| sort - _time
``` data emulation above ```
The above search gives
LogText | Row | _raw | _time | closed_txn | duration | eventcount | field_match_sum | linecount |
text disconnected text | 5 | 1 | 2024-12-18 08:10:30 | |||||
text disconnected text | 4 | 1 | 2024-12-18 08:00:10 |