Hello there!
I am using Splunk Enterprise 7.2.0. I am trying to set up the following flow: I have an index called raw_dns
which is continuously populated with events. Every one of this events has a domain
field. I also have a custom lookup that uses an external service and can verify if the provided domain is infected or not. This lookup returns two things: status (infected or not), categories (list of keywords).
I had to configure the app so that every X minutes, the events received in the last X minutes to be analyzed by the lookup and then put in another index. My approach was the following:
- created a new index
- created a savedsearch that runs every X minutes and does what I said above
- enabled summary indexing
Everything worked well, the index was being populated, but after a period my manager said he also wants some of the fields from the original event (the events in the summary index only had the domain, the status and the categories list). I tried to do this over and over again and I think I found the problem but I don't know how to fix it. My configuration is as follows:
indexes.conf
[domains_infected]
thawedPath = $SPLUNK_DB/domains_infected/thaweddb
homePath = $SPLUNK_DB/domains_infected/db
coldPath = $SPLUNK_DB/domains_infected/colddb
transforms.conf
[domain_lookup]
external_cmd = domain_lookup.py domain
external_type = python
fields_list = domain, domain_status, domain_categories
savedsearches.conf
[Domains Status (5 minutes cron)]
search = index=raw_dns | table domain | lookup domain_lookup domain | makemv delim="," domain_categories | makemv delim="," domain_status
description = Domains Status (5 minutes cron)
dispatch.latest_time = now
dispatch.earliest_time = -5m
enableSched = 1
cron_schedule = */5 * * * *
action.summary_index = 1
action.summary_index._name = domains_infected
action.summary_index.generator = savedsearch
I think the problem is that is used | table domain
in the search. However, I tried each of the following:
- remove | table domain
- use | table *
- use fields - raw
None of the above ideas worked and by not working I mean that the summary index was not being populated with data anymore.
Where's the catch or what am I missing?
Thanks!
@orinciog - It looks like the end result you're trying for is an enrichment of the original event data with information provided by a lookup, and that you want to be able to see all of the original fields along with the lookup-provided fields automatically. Have you considered whether using an automatic lookup will meet this need without having to do any summary indexing? Summary indexing seems to me to be best when you only need a subset of the original event fields, not all of them. Adding all of the original fields to the summarization only results in saving two sets of the same events, along with the associated storage cost.
@delappml3 Yeah, I know it's not very effective from a storage related point of view. My original problem was keeping only some of the fields, but if a client uses our app, there isn't any guarantee that the names of his fields will match ours. Right now, when the user installs the app, he can name a field from his event that is equivalent to ours (and a macro will be automatically created and used), but I don't think this can work when dealing with multiple fields.
I looked up what you said, but the task is not to alter the original data.
Feel free to give me any other ideas or possible solutions, if you can 🙂
Thanks!
@orinciog - Adding an automatic lookup doesn't alter the original data; it only adds enrichment fields from an established lookup. It's no different in that regard from using a manual lookup via the |lookup command. I thought I'd suggest the automatic lookup route as it seems to me it could allow you to avoid the extra work and overhead involved with summary indexing, but either way should work. I don't yet have a proposed direct answer to your question, though.
@delappml3 thanks for your answers 🙂 I'll see what I can do. Thanks!