Splunk Search

Query accessing a very large lookup in another index - earliest and latest

garryclarke
Path Finder

I am trying to join a very large lookup dataset (cab) with my main SPLUNK query and have the lookup data loaded into a separate index. The lookup data being used doesnt have a date element associated with it and therefore the only way to access it seems to be by using the earliest=-1y latest=now conditions.

I have been trying to run the following query but I have the suspicion that the date windows are not getting applied to each datasource input and nothing is getting read back from the cab index and Im just getting a result set with data from ff

(index=ff earliest=-60m latest=now) OR (index=cab earliest=-1y latest=now) | transaction UniqueID

I have been able to get this working using a join in SPLUNK but have been reading here that sometimes a join is not the best way to do this so wanted to have a look at OR'ing the 2 data inoputs and using stats or transaction.

index=cab earliest=-1y latest=now | join type=inner DN_STRIP [search index=ff earliest=-60m latest=now | rename num_strip as DN_STRIP] | stats count by PCP_ID1 | rename PCP_ID1 as ID | sort - count | search count>5

Any ideas would be appreciated.

Tags (3)
0 Karma
1 Solution

lguinn2
Legend

All events in a Splunk index have a date associated with them - if there is none in the data, then Splunk uses the time that the data was indexed as the event time. I don't think you can set two time ranges within a single search - you can use the Search Job Inspector to see how Splunk has interpreted your first search.

If the cab data is just used for lookups, why not use the Splunk lookup feature instead of indexing the data? I don't know what "a very large lookup dataset" is for you, but Splunk can handle lookup tables of over 10 million entries (based on what I see in limits.conf).

Here is a link to the Splunk tutorial on lookups.
You might also benefit from David Carasso's book Exploring Splunk, where the final chapter is about lookups. (Book is free in electronic form).

View solution in original post

0 Karma

lguinn2
Legend

All events in a Splunk index have a date associated with them - if there is none in the data, then Splunk uses the time that the data was indexed as the event time. I don't think you can set two time ranges within a single search - you can use the Search Job Inspector to see how Splunk has interpreted your first search.

If the cab data is just used for lookups, why not use the Splunk lookup feature instead of indexing the data? I don't know what "a very large lookup dataset" is for you, but Splunk can handle lookup tables of over 10 million entries (based on what I see in limits.conf).

Here is a link to the Splunk tutorial on lookups.
You might also benefit from David Carasso's book Exploring Splunk, where the final chapter is about lookups. (Book is free in electronic form).

0 Karma

lguinn2
Legend

Your thinking was good about indexing - but Splunk secretly creates in-memory indexes for lookup files, even for 2 million rows. That's why lookups can be fast too.

0 Karma

garryclarke
Path Finder

Thanks lguinn i wil have a look at the lookup approach. I had this view that accessing indexed data might be faster than a lookup file but then with the difficulty of not have a time field in the cab data it makes it more complicated.
I'll prototype and compare both approaches.
The very large data lookup I refer to is approximately 2 million rows of data so I guess not big in SPLUNK terms

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...