We have events coming from hosts that need to have additional information added to them from two configuration files. One file is a plain text file which contains a label for the set of hosts this particular host belongs to. The second file is JSON which contains meta-data about the configuration of these hosts based upon which label is in the first configuration file.
This meta-data is information like the source of a data stream the hosts are handling and the destination the data stream is being sent as well as information about the data input rate and related information.
When debugging the system these hosts are a part of this information is essential so that we can understand what part of the system they function as.
The problem is that _meta in the forwarder inputs.conf file seems to only support very simple single key -> value pairs. What we need though is more complex hash with nested keys and their associated values. Can _meta handle this? If so what would the inputs.conf file look like then?
If not, then how can I add/decorate this meta-data which is crucial for understanding our system to the events?
In general, I would be careful about adding too much information to _meta tags at index time, as this will slow down indexing and possibly significantly increase your index file sizes.
Splunk is very good at enriching data at search time, if you have the relevant data sources available to search. One option to consider is ingesting the JSON source as a regular sourcetype (doesn't sound like it is a lot of data) and rendering it into a (temporal) lookup? That would enable you to correlate at search time by host (via lookup) and also allow you to see how the meta-data has changed over time.
Do you have a repository for all of this metadata information you use to populate the files on the individual hosts (a CMDB) that could be queried by Splunk directly?
Both alternatives here also prevent you from having to update forwarder configuration whenever any of the two files change.
Thanks, I believe you're correct about the best solution being to enrich the data at search time. However I'm working with other developers who have very specific requirements and are likely to reluctant to the idea of enriching the data at search time. I'll have to find out.
Meta data changes aren't really of interest to my users since they simply reflect the design changes made during development of new features and other customer requests. And they are the ones that create those changes anyway. 🙂
The repository for the meta data is a Git repo so there's going to be a great deal of irrelevant Git data polluting the event data if I include that.
However, as I mentioned above I think enriching the data during search (with perhaps some additional simple tagging) would be best solution.
Thanks for your help.
In general, I would be careful about adding too much information to _meta tags at index time, as this will slow down indexing and possibly significantly increase your index file sizes.
Splunk is very good at enriching data at search time, if you have the relevant data sources available to search. One option to consider is ingesting the JSON source as a regular sourcetype (doesn't sound like it is a lot of data) and rendering it into a (temporal) lookup? That would enable you to correlate at search time by host (via lookup) and also allow you to see how the meta-data has changed over time.
Do you have a repository for all of this metadata information you use to populate the files on the individual hosts (a CMDB) that could be queried by Splunk directly?
Both alternatives here also prevent you from having to update forwarder configuration whenever any of the two files change.