Splunk Search

Most recent event from each source?

yoeljacobsen
Explorer

I'm looking for an efficient way to retrieve the single most recent event from each of about 2000 sources.

It seems that something like:

source=prefix*  | stats first(_raw) as _raw by source

scans a lot of events.

Is there a better way?

Yoel

Tags (1)
1 Solution

Ayn
Legend

Have a look at the metadata command.

| metadata type=sources

UPDATE: So, to get the actual event, you could still use metadata, but in a subsearch that feeds the outer search with specific info on where/when to look for events. I messed around a bit to get a search that's working. For clarity I can also show what did NOT work 😉

[| metadata type=sources | rename lastTime as _time | fields _time source]

The idea here was to get output from the subsearch like this:

( ( _time=1341991627 AND source="source1" ) OR ( _time=1342119251 AND source="source2" ) OR ([...]))

However the problem is that _time is considered to be an internal field, and as such won't get picked up by the format command that is implicitly run by the subsearch. So the output will only contain the source parts.

However using a clever (or well...you be the judge of that) hack we can get the output we want anyway, by creating a field called query which will contain the _time filtering string. query is a special field whose value is returned from the subsearch as-is rather than Splunk adding "query=" before it.

[| metadata type=sources | eval query="_time=".lastTime | fields source query]

This should create a search filter that searches on a number of source/_time pairs as shown above. Because multiple events from a source could potentially occur within the same second, you might still need to add a | dedup source at the end of the outer search to make sure you only get one event per source. I hope this gives you what you were looking for.

View solution in original post

Lamar
Splunk Employee
Splunk Employee

If you're wanting to get the actual indexed 'event', that will be how you do it. If you just want to know when the last event occurred for a source you could do this:

| metadata type=sources | search source="*prefix*" | convert ctime(lastTime) as timestamp | sort - lastTime
0 Karma

yoeljacobsen
Explorer

I would like to have an external system that caches a "snapshot" of a group of sources. The cache will serve hundreds of request per second for some field values. For my purpose I only need the last event from each source but I would like to update the cache every few minutes.. I'm looking for a way to update this cache as efficiently as possible. As this still in design, I'm open for suggestions (such as using multiple indexes etc).

0 Karma

Lamar
Splunk Employee
Splunk Employee

It would because you're inspecting the raw events as opposed to the metadata of your events. The way that both Ayn and myself shows is just for practical timing purposes.

Help us understand what problem you're trying to solve and we may be able to find a better way.

0 Karma

yoeljacobsen
Explorer

Unfortunately it seems my initial method scans all the events over the time range.

0 Karma

Ayn
Legend

Have a look at the metadata command.

| metadata type=sources

UPDATE: So, to get the actual event, you could still use metadata, but in a subsearch that feeds the outer search with specific info on where/when to look for events. I messed around a bit to get a search that's working. For clarity I can also show what did NOT work 😉

[| metadata type=sources | rename lastTime as _time | fields _time source]

The idea here was to get output from the subsearch like this:

( ( _time=1341991627 AND source="source1" ) OR ( _time=1342119251 AND source="source2" ) OR ([...]))

However the problem is that _time is considered to be an internal field, and as such won't get picked up by the format command that is implicitly run by the subsearch. So the output will only contain the source parts.

However using a clever (or well...you be the judge of that) hack we can get the output we want anyway, by creating a field called query which will contain the _time filtering string. query is a special field whose value is returned from the subsearch as-is rather than Splunk adding "query=" before it.

[| metadata type=sources | eval query="_time=".lastTime | fields source query]

This should create a search filter that searches on a number of source/_time pairs as shown above. Because multiple events from a source could potentially occur within the same second, you might still need to add a | dedup source at the end of the outer search to make sure you only get one event per source. I hope this gives you what you were looking for.

yoeljacobsen
Explorer

Brilliant!

I wonder how will it scale for >1000 source. The subsearch will create a very large filter (although far from maxout max value of 10500).

0 Karma

Lamar
Splunk Employee
Splunk Employee

Very slick Ayn.

0 Karma

Ayn
Legend

Updated my answer in an attempt to solve what I think you want to accomplish.

0 Karma

yoeljacobsen
Explorer

Thanks but I would like the raw events, not meta data on the sources.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...