Solved: how to best store processed results for future cor...

awurster · ‎01-07-2015

hi all - we are starting to build our Splunk as our SIEM, and beginning to link and chain info together. we are setting up a few new indexes to store what i think should be collected or post processed data.

for instance - if i have a firewall event that has a new unseen public IP, i want to store some data from that event, as well as some new fields from lookups, iplocation command.

now if i use collect - it doesn't seem to do exactly what i want. i see collect saves the data, but it is in the original form - less any renames, evals, etc; ignoring the fields statement which outputs only select fields; and less any other implied things like CIM field aliases.

so my questions are?

1 - is using collect and a separate index the best to store this stuff? (we'd like to have one table of new / suspect IPs, 1 table of internal assets, etc) or is it best done into lookup tables / CSVs?

2 - should i expect the output of my commands piped into collect statement to match up mostly (excluded meta fields i understand) with the same output without the collect command?

lguinn2 · ‎01-07-2015

IMO, the 6.2 KV stores are not ready or appropriate to be used in this way. But that is probably the future for things that we store today in lookup tables.

For now, I would use lookup tables if possible - or summary indexing. I would use lookup tables for anything where you only care about the current state. But if you want to see the history over time, then summary indexing may be better. There are two main ways to do summary indexing - in one of them you use the collect command and the other does not. I would read up on both of these and then pick the one that seems best to you:

Use summary indexing for increased reporting efficiency

Configure summary indexes

View solution in original post

bmacias84 · ‎01-07-2015

@awurster

Keep in mind any data stored in Splunk is not state full. Even with summary Indexing data can be aged out to frozen/deleted. I have used DB connect or custom Splunk command to send data such as your to a third party database such as mysql, mssql, etc. Then using DB connect to call that information back.

lguinn2 · ‎01-07-2015

IMO, the 6.2 KV stores are not ready or appropriate to be used in this way. But that is probably the future for things that we store today in lookup tables.

For now, I would use lookup tables if possible - or summary indexing. I would use lookup tables for anything where you only care about the current state. But if you want to see the history over time, then summary indexing may be better. There are two main ways to do summary indexing - in one of them you use the collect command and the other does not. I would read up on both of these and then pick the one that seems best to you:

Use summary indexing for increased reporting efficiency

Configure summary indexes

awurster · ‎01-07-2015

thanks lisa. i'm a bit surprised to hear that it's not something we can do well within splunk, but i see your points. thanks for the honesty about readiness of KV stores - much appreciated.

i know it's possible to create separate data stores like bmacias84 mentioned (and we are doing it for other things already) - but the overhead required to spin up and manage new databases and separate data stores seems to outweigh the overhead of owning and running splunk. (especially since we've already invested considerable resources cleaning data and tuning the splunk apps / searches to handle said data.)

as for summary indexing.. i've investigated this a fair bit, but not the newer si- commands, so i think i'll investigate those and have a re-read of the docs. but i just liked the concept of having separate stores for separate data and faster processing times. maybe i can use those commands in this case, maybe not. maybe it will become a giant lookup table chain, or maybe not.

esix_splunk · ‎01-07-2015

On a side note, if you are looking at SIEM functionality, have you talked with your Splunk Account Manager to evaluate Enterprise Security. It does most of what you are talking about, via a combination of using Datamodels, summary indexing, and report acceleration..

lguinn2 · ‎01-08-2015

Excellent point @esix_splunk!

awurster · ‎01-07-2015

i'm assuming the answer is in new 6.2 KV stores, but i'm keen to hear what others are doing.

right now for next month or so we're stuck on 6.1.. so maybe i'll end up going via lookup files and then switching back to KV stores at a later date.

how to best store processed results for future correlation

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

how to best store processed results for future correlation

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits