How to eliminate duplicate rows before transaction command. Because of which I am getting wrong calculation.
eg scenario: calculating downtime based on events
Query is
index="winevent" host IN (abc) EventCode=6006 OR EventCode="6005" Type=Information
| eval BootUptime = if(EventCode=6005,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| eval stoptime = if(EventCode=6006,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| transaction host startswith=6006 endswith=6005 maxevents=2
| eval duration=tostring(duration,"duration")
| eval time_taken = replace(duration,"(\d+)\:(\d+)\:(\d+)","\1h \2min \3sec")
| rename time_taken AS Downtime
| dedup Downtime, BootUptime
| table host,stoptime, BootUptime, Downtime
Result is ::
host stoptime bootuptime Downtime
abc 2022-30-01 10:39:25 2022-30-01 10:40:29 00h 01min 04sec
abc 2022-09-01 09:27:53 2022-09-01 09:28:34 00h 00min 41sec
abc 2021-28-11 10:52:52 2022-09-01 09:28:34 41d 22h 35min 42sec
in the result since i have duplicate in bootuptime the dowtime calculation is incorrect. How to get rid of this?
Thanks in Advance
Well, what is your data? You seem not to have "duplicate rows" but quite the contrary - you have missing data. If a host stopped at 2022-09-01 09:27:53 , it must have booted up somewhere sooner than 2022-09-01 09:28:34, doesn't it?
Anyway, it seems that sorting the data and doing a streamstats can be more effective than using transaction.
@PickleRick since there is duplication.. its giving us downtime in dayss.. need to dedup. but i am not finding a way
What do you mean by "duplication?
Your table:
host stoptime bootuptime Downtime
abc 2022-30-01 10:39:25 2022-30-01 10:40:29 00h 01min 04sec
abc 2022-09-01 09:27:53 2022-09-01 09:28:34 00h 00min 41sec
abc 2021-28-11 10:52:52 2022-09-01 09:28:34 41d 22h 35min 42sec
If the host stopped at 2022-09-01 09:28:34 after being booted up at 2021-28-11 10:52:52 2022-09-01, how could have it booted up at 2022-09-01 09:27:53 2022-09-01?
What are your events showing? Because either you indeed have "duplicate" data but that's a problem with quality of your data if it's supposed to show an incident of transition between states or you have loss of data.
Hi @priya1926,
let me understand, if in one day you have more boot events you want to calculate only the first, is it correct?
could you have more boots in the same day?
if not, you could use stats command and have a faster search:
index="winevent" host IN (abc) EventCode IN (6006,6005) Type=Information
| eval
BootUptime = if(EventCode="6005",_time,null()),
stoptime = if(EventCode=6006,_time,null())
| bin _time span=1d
| stats earliest(BootUptime) AS BootUptime latest(stoptime) AS stoptime BY _time host
| eval Downtime=tostring(stoptime-BootUptime,"duration")
| table host stoptime BootUptime Downtime
Ciao.
Giuseppe
Nope. I need to calculate all the events... eg: six months or 3 months.. or a server that rebooted thrice a day.. even that..
Hi @priya1926,
in this case you have to use the transaction command using the option keeporphan=true and filtering for _txn_orphan|=1, something like this:
index="winevent" host IN (abc) EventCode=6006 OR EventCode="6005" Type=Information
| eval BootUptime = if(EventCode=6005,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| eval stoptime = if(EventCode=6006,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| transaction host startswith=6006 endswith=6005 maxevents=2 keeporphan=True
| where _txn_orphan!=1
| eval duration=tostring(duration,"duration")
| eval time_taken = replace(duration,"(\d+)\:(\d+)\:(\d+)","\1h \2min \3sec")
| rename time_taken AS Downtime
| dedup Downtime, BootUptime
| table host,stoptime, BootUptime, Downtime
in this way you have in way, you have transaction with only two events (start and end) and you discard all the transactions with only one event.
Ciao.
Giuseppe
This doesnot seem to give me any results. and keeporphans=True .. No result though i have events..
Hi @priya1926,
try to delete
| where _txn_orphan!=1
and see what's happens: it should be both transaction with start and end and transaction with only start.
Ciao.
Giuseppe
abc | 2021-22-11 14:14:22 | 2021-22-11 14:14:37 | 00h 00min 15sec |
abc | 2021-22-11 14:14:22 | 2022-07-01 13:45:30 | 45+23h 31min 08sec |
its the same after removing | where _txn_orphan!=1
Hi @priya1926,
could you try
index="winevent" host IN (abc) EventCode=6006 OR EventCode="6005" Type=Information
| eval BootUptime = if(EventCode=6005,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| eval stoptime = if(EventCode=6006,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| transaction host startswith=6006 endswith=6005 maxevents=2 keeporphan=True
| rename duration AS Downtime
| dedup Downtime, BootUptime
| table host stoptime BootUptime Downtime _txn_orphan
and share results?
Ciao.
Giuseppe
2021-22-11 20:05:44 | 2021-22-11 20:06:00 | 16 | |
2021-22-11 14:14:22 | 2021-22-11 14:14:37 | 15 | |
2021-22-11 14:14:22 | 2022-07-01 13:45:30 | 3972668 | |
2021-22-11 13:48:33 | 2021-22-11 13:48:49 | 16 | |
2021-22-11 11:12:04 | 2021-22-11 11:12:20 | 16 | |
2021-22-11 10:00:03 | 2021-22-11 10:00:24 | 21 | |
2021-19-11 21:48:50 | 2021-19-11 21:49:05 | 15 | |
2021-19-11 21:33:05 | 2021-19-11 21:33:18 | 13 | |
2021-19-11 20:57:51 | 2021-19-11 20:58:06 | 15 |
since we have duplication here in bold. that's the problem
2021-22-11 14:14:22 | 2021-22-11 14:14:37 | 15 |
2021-22-11 14:14:22 | 2022-07-01 13:45:30 | 3972668 |
@gcusello any findings?
Hi @priya1926,
there's one thing that I don't understand:
when there's no end, why do you have a date (probably the present day)?
In the search the present day isn't generated in any way!
Please try this and share results:
index="winevent" host IN (abc) EventCode=6006 OR EventCode="6005" Type=Information
| eval BootUptime = if(EventCode=6005,strftime(_time, "%Y-%d-%m %H:%M:%S"),"X")
| eval stoptime = if(EventCode=6006,strftime(_time, "%Y-%d-%m %H:%M:%S"),"X")
| transaction host startswith=6006 endswith=6005 maxevents=2 keeporphan=True
| rename duration AS Downtime
| dedup Downtime, BootUptime
| table host stoptime BootUptime Downtime _txn_orphan
Ciao.
Giuseppe
Yes. I understand that your table has "duplicated" values. But the question is what does it mean in terms of the "underlying reality". Splunk events and things you do with them represent some external world. Manipulating the data is one thing but the other is whether they correspond to anything. You can do anything with the data but will you get meaningful results?