Splunk Search

Good Way to Add Site Specific Metadata

infrauser
Explorer

Hi Folks,

I'd appreciate any advice on a good way to add site specific information to events. I have a distributed setup with multiple indexers per location, and multiple locations. The index names are the same across all of the indexers.

For example, let's say I have a datacenter in ny and one in la. Each of these has 3 indexers. I want to be able to search across all of them as well as limit my search to just ny or just la. There is nothing in the data that denotes which location the event took place in. I also don't want to have to know the hostnames of the splunk indexers in order to perform the localized search.

I was thinking of adding an indexed field in a similar fashion to this thread: http://answers.splunk.com/questions/1453/how-do-i-add-metadata-to-events-coming-from-a-splunk-forwar...

Are there any other alternatives?

Thanks.

Tags (1)
0 Karma
1 Solution

southeringtonp
Motivator

Several possible approaches. Just be sure to test and proceed with caution, especially if choosing option I or III.


Option I: Create a separate index for each site.

For each site, create a new index with a name matching the site.

Then just set the list of which ones should be searched by default, either by setting srchIndexesDefault in authorize.conf or through the Manager's Role settings. See How do I Set the Default Index? and authorize.conf

Now, you can search on index=NY, etc.

For existing data, you can re-index, but you probably don't need to. Take a look at this thread for some hints on moving data into a new index. Or, you can just live with the existing data in the default main index until it ages out.

See here for more information:
      http://answers.splunk.com/questions/5479/how-to-rename-an-index

Note that the link is for version 4.1.4 and earlier. I suspect that it will still work, but have not verified -- there may be other implications depending on version.

Since this can impact your existing data, you'll definitely want to test ahead of time and verify that your procedure works as you expect. It would also be a good idea to contact Splunk support and ask them to review your plan.


Option II: Create an eventtype for each site.

You still need to know the hostnames initially, but once it's configured you will only need to update the eventtype definition when things change.

Technically, this still requires identifying a list of hosts or other criteria per-site, but now you only have to manage it at the eventtype level, not per-search or per-view.


Option III: Create an indexed field

Usually not recommended, but may work well in your situation. It will increase the size of the index somewhat and could have implications for search performance. And, of course, it's a mostly permanent choice.


Option IV: Use a lookup table to map between host and site

Listed only for completeness. This option sounds good in theory, but is likely to kill performance, since trying to search by site will likely scan across all events before triggering the lookup. It technically accomplishes what you want, but will utterly destroy perfomance.


Option V: Adopt a naming convention for hosts that includes site, or use IP addresses for host

Probably not realistic, but worth mentioning. If, e.g., all of your LA machines are named 'LA-XXXXX' or have an ip address in 10.1.2.XXX, then it's easy to do a search on host="LA-*" or similar.

If you decide to go this route, you can use a lookup (scripted or static) to resolve the hostname for display purposes. It does create significant performance issues if you want to regularly search based on hostname.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

In a case like this, where there is a presumably small number of distinct values of the indexed field, it will probably be fine to use an indexed field

Also, southeringtonp's Option IV should work very well in fact if you are using an automatic CSV lookup table mapping hosts to site. Splunk will automatically do a reverse lookup on the site name, expand to the host name. In fact, this would be preferable (and easier to maintain) than using Option II with eventtypes.

0 Karma

infrauser
Explorer

This is nice to know as well. Thanks.

0 Karma

southeringtonp
Motivator

Nice! I had assumed that the CSV lookup wouldn't take place in time for the check against the index - that makes a huge difference.

0 Karma

southeringtonp
Motivator

Several possible approaches. Just be sure to test and proceed with caution, especially if choosing option I or III.


Option I: Create a separate index for each site.

For each site, create a new index with a name matching the site.

Then just set the list of which ones should be searched by default, either by setting srchIndexesDefault in authorize.conf or through the Manager's Role settings. See How do I Set the Default Index? and authorize.conf

Now, you can search on index=NY, etc.

For existing data, you can re-index, but you probably don't need to. Take a look at this thread for some hints on moving data into a new index. Or, you can just live with the existing data in the default main index until it ages out.

See here for more information:
      http://answers.splunk.com/questions/5479/how-to-rename-an-index

Note that the link is for version 4.1.4 and earlier. I suspect that it will still work, but have not verified -- there may be other implications depending on version.

Since this can impact your existing data, you'll definitely want to test ahead of time and verify that your procedure works as you expect. It would also be a good idea to contact Splunk support and ask them to review your plan.


Option II: Create an eventtype for each site.

You still need to know the hostnames initially, but once it's configured you will only need to update the eventtype definition when things change.

Technically, this still requires identifying a list of hosts or other criteria per-site, but now you only have to manage it at the eventtype level, not per-search or per-view.


Option III: Create an indexed field

Usually not recommended, but may work well in your situation. It will increase the size of the index somewhat and could have implications for search performance. And, of course, it's a mostly permanent choice.


Option IV: Use a lookup table to map between host and site

Listed only for completeness. This option sounds good in theory, but is likely to kill performance, since trying to search by site will likely scan across all events before triggering the lookup. It technically accomplishes what you want, but will utterly destroy perfomance.


Option V: Adopt a naming convention for hosts that includes site, or use IP addresses for host

Probably not realistic, but worth mentioning. If, e.g., all of your LA machines are named 'LA-XXXXX' or have an ip address in 10.1.2.XXX, then it's easy to do a search on host="LA-*" or similar.

If you decide to go this route, you can use a lookup (scripted or static) to resolve the hostname for display purposes. It does create significant performance issues if you want to regularly search based on hostname.

0 Karma

infrauser
Explorer

Thanks for the info. I'll definitely check a couple of these out.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Plus, it's easier to maintain a CSV table than the eventtypes.conf file.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Option IV should be preferred over Option II with an automatic file-based lookup. Splunk will internally perform a reverse lookup, such that site=x will be expanded into (host=h1 OR host=h2 ...), so performance should be the same for searching as with eventtypes or macros.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...