Splunk Search

How can gaps be detected in data?

sarit_s
Communicator

Hello, is there a way to detect gaps in data by some id?
As well as check if the gap is greater than 4 hours, then show the last event and the one before him?

1 Solution

DavidHourani
Super Champion

Hi @sarit_s,

Yes you can !

Here's an easy command that you can apply on your data with 900 sec gap:

 ... | streamstats current=f last(_time) as last_time by host | eval gap = last_time - _time | where gap > 900 | convert ctime(last_time) as last_time | table _time last_time host

from here :
https://answers.splunk.com/answers/69728/search-to-find-gaps-in-data.html

Cheers,
David

View solution in original post

DavidHourani
Super Champion

dedup at first will only keep the last 2 events of every "source" field and timeDeltaS returns the time gap between the last one and current one. The numbers you see 0,1,2 are the gaps in second and the | where timeDeltaS > 14400 should be applied in the end to keep only values larger than 14400 seconds since now you're keeping all deltas.

0 Karma

sarit_s
Communicator

im not sure i understood..

if i see some SerialNumber and timeDeltaS 1 then i have gap of one second between two events ?
how can i know which event it is ?
can i add the source ?
i added to the stats values(source) as source but it is giving me too many files there

0 Karma

DavidHourani
Super Champion

Its always the gap between the last two events, so its the time since the last time this serial number appeared. You can also keep the source field if needed since its the same as ur serial number field anw :slightly_smiling_face:

0 Karma

sarit_s
Communicator

there is no way to find all the gaps withing this Serial number ?
it is possible that i will have more than one gap for each one of them

0 Karma

DavidHourani
Super Champion

ouch... yeah in that case you'll have to get rid of the dedup and you're back where we started:

 index=ssys_*_pj 
 | streamstats current=f last(_time) as last_time by source
 | eval timeDeltaS = last_time - _time
 | eval SerialNumber = mvindex(split(source,"/") ,7) 
 | stats last(timeDeltaS) as timeDeltaS by  SerialNumber

From there the best way to optimize is to reduce the number of results in index=ssys_*_pj by either limiting the time frame or specifying exactly which events you're looking for

sarit_s
Communicator

thanks !

0 Karma

DavidHourani
Super Champion

you're most welcome :slightly_smiling_face: please upvote and accept if it was helpful ! :slightly_smiling_face:

0 Karma

sarit_s
Communicator

Hi @DavidHourani
can we back for a sec to my question ?
is it possible to show the source of _time and the source of _last_time ?
i tried to add source to streamstats but i get only the last source

0 Karma

DavidHourani
Super Champion

yeah, you could add values(_time) to the last stats command, this should add the time field for you, not sure what values you'll get though, could you have a look and let me know ?

0 Karma

sarit_s
Communicator

im talking about that i want to see the source files..

also, i found out that this search :

index=ssys_*_pj sourcetype=*log*
| `SerialNumber`
| streamstats current=f last(_time) as last_time values(source) as source by SerialNumber | eval gap = last_time - _time | where gap > 14400 | convert timeformat="%Y-%m-%d %H:%M:%S" ctime(last_time) as last_time | table _time last_time SerialNumber source

working better for me.. its true that it is much slower but gives me the results in more user friendly way..
maybe it is possible to make this one run faster ?
any way, the important thing is to have the source :slightly_smiling_face:

0 Karma

DavidHourani
Super Champion

This could be a bit faster, the best would be to reduce the size of the data set from the start and then apply everything from there :

  index=ssys_*_pj sourcetype=*log*
  | streamstats current=f last(_time) as last_time by source
  | eval gap = last_time - _time
  | where gap > 14400 
  | eval SerialNumber = mvindex(split(source,"/") ,7) 
  | stats last(gap) as gap by  SerialNumber, source,last_time,_time
  | convert timeformat="%Y-%m-%d %H:%M:%S" ctime(last_time) as last_time 
0 Karma

sarit_s
Communicator

this one is faster but returns no result

0 Karma

DavidHourani
Super Champion

fixed, try now :slightly_smiling_face:

0 Karma

sarit_s
Communicator

i have result now
it is slow again :slightly_smiling_face:
but i still see only one source..
is it possible to see both source files ?
i want to see the source file of where the gap starts and where it was ended

0 Karma

sarit_s
Communicator

i also see that this query :

 index=ssys_*_pj sourcetype=*log*
 | `SerialNumber`
 | streamstats current=f last(_time) as last_time values(source) as source by SerialNumber | eval gap = last_time - _time | where gap > 14400 | convert timeformat="%Y-%m-%d %H:%M:%S" ctime(last_time) as last_time | table _time last_time SerialNumber source

returns more results that the new one

0 Karma

sarit_s
Communicator

so.. any idea how to have the source path for each time ?

0 Karma

sarit_s
Communicator

hey @DavidHourani
i found out that tstats can help me make the query faster but i think that i'm missing something since im not getting results.
this is the query:
| tstats values WHERE (index=ssys__pj AND sourcetype=*log) by source

  | streamstats current=f last(_time) as last_time values(source) as source last(source) as last_source by SerialNumber | eval gap = last_time - _time | where gap > 14400 | convert timeformat="%Y-%m-%d %H:%M:%S" ctime(last_time) as last_time | table SerialNumber _time last_time  last_source source

also, i added "last_source" but the results returns are the same as "source"

any suggestion ?

0 Karma

DavidHourani
Super Champion

Ah yes, tstats is great, and to make it even faster use this option : summariesonly=true this way it runs on accelerated data only.

Now back to the query, seems like you were missing the |SerialNumberpart after thetstats` so it should be like this :

| tstats values WHERE (index=ssys_*_pj AND sourcetype=log) by source
| `SerialNumber`
| streamstats current=f last(_time) as last_time values(source) as source last(source) as last_source by SerialNumber 
| eval gap = last_time - _time 
| where gap > 14400 
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(last_time) as last_time 
| table SerialNumber _time last_time  last_source source

If it still doesn't work, I advise you to go through the search part by part to check if its working well, start with the first part, then add the next, etc, till you reach to part that doesn't give results.

0 Karma

sarit_s
Communicator

Hi @DavidHourani
thanks !
summary is good but since i didn't collect summary yet, can i still use this option ?

I didn't add

|SerialNumber
since i added this field as calculated field so i won't need macro anymore :slightly_smiling_face:
i have a problem that when im running this query with
tstats values
im getting this error :
Job terminated unexpectedly
when im running it with count its stopped returning results when running this part :
where gap > 14400

0 Karma

sarit_s
Communicator

i will give an example

i have logs from 07/05/19 15:23:57 for some SerialNumber
and the next time i have log is 08/05/19 09:19:02
i want to catch such stuff
is it possible ?

0 Karma
Get Updates on the Splunk Community!

See just what you’ve been missing | Observability tracks at Splunk University

Looking to sharpen your observability skills so you can better understand how to collect and analyze data from ...

Weezer at .conf25? Say it ain’t so!

Hello Splunkers, The countdown to .conf25 is on-and we've just turned up the volume! We're thrilled to ...

How SC4S Makes Suricata Logs Ingestion Simple

Network security monitoring has become increasingly critical for organizations of all sizes. Splunk has ...