Splunk Search

How to eliminate duplicate rows before transaction command?

priya1926
Path Finder

How to eliminate duplicate rows before transaction command. Because of which I am getting wrong calculation.

eg scenario: calculating downtime based on events

Query is 

 

index="winevent" host IN (abc) EventCode=6006 OR EventCode="6005" Type=Information
| eval BootUptime = if(EventCode=6005,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| eval stoptime = if(EventCode=6006,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| transaction host startswith=6006 endswith=6005 maxevents=2
| eval duration=tostring(duration,"duration")
| eval time_taken = replace(duration,"(\d+)\:(\d+)\:(\d+)","\1h \2min \3sec")
| rename time_taken AS Downtime
| dedup Downtime, BootUptime
| table host,stoptime, BootUptime, Downtime

 



Result is ::

 

host                stoptime                                              bootuptime                                                                Downtime
abc           2022-30-01 10:39:25                        2022-30-01 10:40:29                                         00h 01min 04sec
abc           2022-09-01 09:27:53                        2022-09-01 09:28:34                                         00h 00min 41sec
abc           2021-28-11 10:52:52                        2022-09-01 09:28:34                                     41d 22h 35min 42sec

 




in the result since i have duplicate in bootuptime the dowtime calculation is incorrect. How to get rid of this?

Thanks in Advance

Labels (2)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Well, what is your data? You seem not to have "duplicate rows" but quite the contrary - you have missing data. If a host stopped at 2022-09-01 09:27:53 , it must have booted up somewhere sooner than 2022-09-01 09:28:34, doesn't it?

Anyway, it seems that sorting the data and doing a streamstats can be more effective than using transaction.

0 Karma

priya1926
Path Finder

@PickleRick since there is duplication.. its giving us downtime in dayss.. need to dedup. but i am not finding a way

0 Karma

PickleRick
SplunkTrust
SplunkTrust

What do you mean by "duplication?

Your table:

host stoptime bootuptime Downtime
abc 2022-30-01 10:39:25 2022-30-01 10:40:29 00h 01min 04sec
abc 2022-09-01 09:27:53 2022-09-01 09:28:34 00h 00min 41sec
abc 2021-28-11 10:52:52 2022-09-01 09:28:34 41d 22h 35min 42sec

If the host stopped at 2022-09-01 09:28:34 after being booted up at 2021-28-11 10:52:52 2022-09-01, how could have it booted up at 2022-09-01 09:27:53 2022-09-01?

What are your events showing? Because either you indeed have "duplicate" data but that's a problem with quality of your data if it's supposed to show an incident of transition between states or you have loss of data.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @priya1926,

let me understand, if in one day you have more boot events you want to calculate only the first, is it correct?

could you have more boots in the same day?

if not, you could use stats command and have a faster search:

 

index="winevent" host IN (abc) EventCode IN (6006,6005) Type=Information
| eval 
   BootUptime = if(EventCode="6005",_time,null()),
   stoptime = if(EventCode=6006,_time,null())
| bin _time span=1d
| stats earliest(BootUptime) AS BootUptime latest(stoptime) AS stoptime BY _time host
| eval Downtime=tostring(stoptime-BootUptime,"duration")
| table host stoptime BootUptime Downtime

Ciao.

Giuseppe

 

0 Karma

priya1926
Path Finder

Nope. I need to calculate all the events... eg: six months or 3 months.. or a server that rebooted thrice a day.. even that..

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @priya1926,

in this case you have to use the transaction command using the option keeporphan=true and filtering for _txn_orphan|=1,  something like this:

index="winevent" host IN (abc) EventCode=6006 OR EventCode="6005" Type=Information
| eval BootUptime = if(EventCode=6005,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| eval stoptime = if(EventCode=6006,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| transaction host startswith=6006 endswith=6005 maxevents=2 keeporphan=True
| where _txn_orphan!=1
| eval duration=tostring(duration,"duration")
| eval time_taken = replace(duration,"(\d+)\:(\d+)\:(\d+)","\1h \2min \3sec")
| rename time_taken AS Downtime
| dedup Downtime, BootUptime
| table host,stoptime, BootUptime, Downtime

in this way you have in way, you have transaction with only two events (start and end) and you discard all the transactions with only one event.

Ciao.

Giuseppe

0 Karma

priya1926
Path Finder

This doesnot seem to give me any results. and keeporphans=True .. No result though i have events..

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @priya1926,

try to delete

| where _txn_orphan!=1

and see what's happens: it should be both transaction with start and end and transaction with only start.

Ciao.

Giuseppe

0 Karma

priya1926
Path Finder
abc2021-22-11 14:14:222021-22-11 14:14:3700h 00min 15sec
abc2021-22-11 14:14:222022-07-01 13:45:3045+23h 31min 08sec

 

 

its the same after removing | where _txn_orphan!=1

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @priya1926,

could you try

 

 

index="winevent" host IN (abc) EventCode=6006 OR EventCode="6005" Type=Information
| eval BootUptime = if(EventCode=6005,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| eval stoptime = if(EventCode=6006,strftime(_time, "%Y-%d-%m %H:%M:%S"),null())
| transaction host startswith=6006 endswith=6005 maxevents=2 keeporphan=True
| rename duration AS Downtime
| dedup Downtime, BootUptime
| table host stoptime  BootUptime  Downtime _txn_orphan

 

 

and share results?

Ciao.

Giuseppe

0 Karma

priya1926
Path Finder

 

 2021-22-11 20:05:442021-22-11 20:06:0016
 2021-22-11 14:14:222021-22-11 14:14:3715
 2021-22-11 14:14:222022-07-01 13:45:303972668
 2021-22-11 13:48:332021-22-11 13:48:4916
 2021-22-11 11:12:042021-22-11 11:12:2016
 2021-22-11 10:00:032021-22-11 10:00:2421
 2021-19-11 21:48:502021-19-11 21:49:0515
 2021-19-11 21:33:052021-19-11 21:33:1813
 2021-19-11 20:57:512021-19-11 20:58:0615
0 Karma

priya1926
Path Finder

since we have duplication here in bold. that's the problem

2021-22-11 14:14:222021-22-11 14:14:37

15

2021-22-11 14:14:222022-07-01 13:45:303972668 

 

0 Karma

priya1926
Path Finder

@gcusello any findings?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @priya1926,

there's one thing that I don't understand:

when there's no end, why do you have a date (probably the present day)?

In the search the present day isn't generated in any way!

Please try this and share results:

index="winevent" host IN (abc) EventCode=6006 OR EventCode="6005" Type=Information
| eval BootUptime = if(EventCode=6005,strftime(_time, "%Y-%d-%m %H:%M:%S"),"X")
| eval stoptime = if(EventCode=6006,strftime(_time, "%Y-%d-%m %H:%M:%S"),"X")
| transaction host startswith=6006 endswith=6005 maxevents=2 keeporphan=True
| rename duration AS Downtime
| dedup Downtime, BootUptime
| table host stoptime  BootUptime  Downtime _txn_orphan

Ciao.

Giuseppe

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yes. I understand that your table has "duplicated" values. But the question is what does it mean in terms of the "underlying reality". Splunk events and things you do with them represent some external world. Manipulating the data is one thing but the other is whether they correspond to anything. You can do anything with the data but will you get meaningful results?

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...