What is the performance difference between join an...

nicolocervo · ‎05-12-2022

I am importing in splunk many tables of data of 500 to 10000 events each and I need to use them to enrich events with scheduled searches. At the moment I import these tables using a modular input and dumping them into an index, I then join my saved searches results with the latest data from this index.
The tables are imported once a day to update if something changed (they usually are mostly unchanged).

index=my_events
| join type=left common_field
    [ search index=imported_data source=src earliest=-24h
       stats latest(*) as * ]

I know join is bad for performance and was wondering if importing the data in a KVStore and setting up an automatic lookup for the index with the data I want to enrich would be a better solution.
in this case i would overwrite the KVStore once a day with the new data.

Other solutions are welcome, these are the ones I came up with.
Thanks.

gcusello · ‎05-12-2022

Ho @nicolocervo ,

use ko in ad the last Solution because it’s very slow and resource expensive.

in your case, if you have less that 10,000 rose, I hint to use a lookup.

you fon’t news a kvstore because you have to drop and recreate the lookup every night

the other choice you shock explore are DataModels:in fee words a db table containing al the data or also a Summary index scheduling a Search from your index to have every day the data.

then use stata instead join to join data

you can find many samples of this approach in my other ansierà.

if you could share a sample of your seatch I could guide you in this approach

index=my_events OR (index=imported_data source=src earliest=-24h)

| stats values(field1) AS field1 values(field2) AS field2 dc(index) AS dc_index values(index) AS index BY common_field

| where dc_index=2

Ciao

Giuseppe

What is the performance difference between join and lookup/ KVStore?

join

lookup

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?