Splunk Search

Tricky query? I need to count the number of matching instances for each unique entity ID, but chart it be the timestamp of the first instance


I have a dataset from logs full of events: I am matching on a certain event, we'll call "create-copy". Each event has a unique ID, and all events have an entity.id string - some events have identical entity.ids, meaning that the events originate from the same entity. We have a business concern about multiple "create-copy" events for the same entity-id.

Basic search for matching events: index="main" kind="create-copy" |

I need to show a graph (-48h, sliced by hour) of repeating (meaning same entity.id) instances of these "create-copy" events, focused on the timestamps of the first instances of repeated events. The Y-Axis needs to be the average count of repetitions per unique entity.id. Just a raw count of matching events is no good as these events have their own natural ebb and flow.

While this creates a table of matching first-instances of matching events, it doesn't get us anywhere close to the graph we need.

| stats count(kind) as "count of create-copy events",earliest(when.$date) as mywhen by entity.id 


Follow-up: still not resolved. But I have been thinking about a better way to phrase the question I am trying to ask. This has been vexing me, but I think this is a lot closer:

For a specific slice of time T, say an hour, identify all events registered during that time which match the query, but also whose entity.ids have never matched this query before (in any previous slice of time). Now, for this "first-timers" set of entity.ids, E-sub-T, count up all their matching events (same criteria) across all time (really this means from T till 'now'), and calculate the average number of matching instances per entity.id in the set E-sub-T. This may result in a simple result for 1hr time-slice T, like '3.6'. This is something like their repeat-count. Finally, via a timechart I suppose, we want to run this analysis across say 48 consecutive hours (likely -48h to 'now'), and chart the graph as 48 bars: standard bar-chart.



As long as timechart gets values for _time, and they seem to be epochtime values, you can get those _time values from anywhere you like. For instance we can make those _time values using the stats command and then pipe the output of stats into timechart.

As a side note, that field name of "when.$date" may well bite you at some point. I'd be surprised if "$" characters in field names weren't going to cause at least one problem somewhere, particularly since the "$" character specifically is a control character used for stringreplace intentions, and also for macro arguments.

index="main" kind="create-copy" | stats count earliest(when.$date) as _time by entity.id | timechart sum(count) as count by entity.id


I think the insight in this approach - I feel it is progress, but so far I have nothing to show for it. Results coming out of stats look good when I drop the |timechart, but I cannot make any variation of |timechart produce anything at all. It doesn't error, but the results are always empty - like the counting/summing of events fails silently.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.