Splunk Search

Efficient way to look for values across many millions (hundreds of, or, billions) of events?

howyagoin
Contributor

Hi,

I've got a sourcetype which has around 100,000 values to a field across 225,000,000 events per day, and another sourcetype which has a total of around 5000 values/events and is static (very little change over the course of a year).

What is the most efficient way to find out IF the second sourcetype has any occurrence in the first, possibly going back 30+ days? I was leaning towards a summary-index based query conducted every few hours, to extract the unique values of the large sourcetype, then check the smaller against that - but even that would take a while.

Looking at the various options, such as "return" and "join" - or others - not sure what is the most efficient.

I don't want all of the values from the larger source that contain the smaller, indeed, I just want a list of the smaller sourcetype values that also occur in the much larger sourcetype.

Thanks!

Tags (2)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Subsearch should be most efficient here:

sourcetype=big [ sourcetype=small | return 6000 Value ] 
| dedup Value

assuming that Value is the field name containing the value and is the same in both sourcetypes. If not, there are little tweaks to the return command to handle it.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Subsearch should be most efficient here:

sourcetype=big [ sourcetype=small | return 6000 Value ] 
| dedup Value

assuming that Value is the field name containing the value and is the same in both sourcetypes. If not, there are little tweaks to the return command to handle it.

0 Karma

howyagoin
Contributor

Thought so - was doing that, but it's still going to take many hours (days?) to run. Likely I'll have to build a better mousetrap here, as the data is just too vast to do the full 30 days worth of querying I need to.

0 Karma
Get Updates on the Splunk Community!

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

If you’re unfamiliar, .conf is Splunk’s premier event where the Splunk community, customers, partners, and ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

There’s something special about this time of year—maybe it’s the glow of the holidays, maybe it’s the ...