I am using Splunk Enterprise 7.2.0. I am trying to set up the following flow: I have an index called
raw_dns which is continuously populated with events. Every one of this events has a
domain field. I also have a custom lookup that uses an external service and can verify if the provided domain is infected or not. This lookup returns two things: status (infected or not), categories (list of keywords).
I had to configure the app so that every X minutes, the events received in the last X minutes to be analyzed by the lookup and then put in another index. My approach was the following:
- created a new index
- created a savedsearch that runs every X minutes and does what I said above
- enabled summary indexing
Everything worked well, the index was being populated, but after a period my manager said he also wants some of the fields from the original event (the events in the summary index only had the domain, the status and the categories list). I tried to do this over and over again and I think I found the problem but I don't know how to fix it. My configuration is as follows:
[domains_infected] thawedPath = $SPLUNK_DB/domains_infected/thaweddb homePath = $SPLUNK_DB/domains_infected/db coldPath = $SPLUNK_DB/domains_infected/colddb
[domain_lookup] external_cmd = domain_lookup.py domain external_type = python fields_list = domain, domain_status, domain_categories
[Domains Status (5 minutes cron)] search = index=raw_dns | table domain | lookup domain_lookup domain | makemv delim="," domain_categories | makemv delim="," domain_status description = Domains Status (5 minutes cron) dispatch.latest_time = now dispatch.earliest_time = -5m enableSched = 1 cron_schedule = */5 * * * * action.summary_index = 1 action.summary_index._name = domains_infected action.summary_index.generator = savedsearch
I think the problem is that is used
| table domain in the search. However, I tried each of the following:
| table domain
| table *
fields - raw
None of the above ideas worked and by not working I mean that the summary index was not being populated with data anymore.
Where's the catch or what am I missing?
@orinciog - It looks like the end result you're trying for is an enrichment of the original event data with information provided by a lookup, and that you want to be able to see all of the original fields along with the lookup-provided fields automatically. Have you considered whether using an automatic lookup will meet this need without having to do any summary indexing? Summary indexing seems to me to be best when you only need a subset of the original event fields, not all of them. Adding all of the original fields to the summarization only results in saving two sets of the same events, along with the associated storage cost.
@delappml3 Yeah, I know it's not very effective from a storage related point of view. My original problem was keeping only some of the fields, but if a client uses our app, there isn't any guarantee that the names of his fields will match ours. Right now, when the user installs the app, he can name a field from his event that is equivalent to ours (and a macro will be automatically created and used), but I don't think this can work when dealing with multiple fields.
I looked up what you said, but the task is not to alter the original data.
Feel free to give me any other ideas or possible solutions, if you can 🙂
@orinciog - Adding an automatic lookup doesn't alter the original data; it only adds enrichment fields from an established lookup. It's no different in that regard from using a manual lookup via the |lookup command. I thought I'd suggest the automatic lookup route as it seems to me it could allow you to avoid the extra work and overhead involved with summary indexing, but either way should work. I don't yet have a proposed direct answer to your question, though.