Splunk Search

Does index time field extraction make sense for our situation?

Engager

(currently using Splunk 4.3.3 build 128297)

I have poked around the docs covering index time field extraction and some of the related Q&A but I decide I would ask directly outlining our situation.

We have a logging facility that several of our future product will use. This facility receives JSON payloads containing key/value pairs like the following (names have been changed to protect the innocent).

{ 
  "key1" : "value1",
  "key2" : "value2",
  (could contain more pairs)
  "entries" : [
                {
                 "key3" : "value3a",
                 "key4" : "value4a",
                 "key5" : "value5",
                 (could contain more pairs)
                },
                {
                 "key3" : "value3b",
                 "key4" : "value4b",
                 "key6" : "value6",
                 (could contain more pairs)
                },
                (could contain more entries)
              ]
}

When the logging facility gets the above example JSON payload it would turn it into the following two log statements and push those to splunk via TCP.

timestamp key1="value1" key2="value2" key3="value3a" key4="value4a" key5="value5"
timestamp key1="value1" key2="value2" key3="value3b" key4="value4b" key6="value6"

We are defining "key1" to be used to denote the product/component submitting the data and the value it contains would follow a reverse DNS style naming convention but with no real restrictions on the hierarchy of it other then ensuring it likely unique across our family of products. For example: "mycompany.product.component" or "mycompany.mydivision.product.component.subcomponent".

The remaining key/value pairs are product specific (aka can be whatever the product wants). In other words key1 will be used to namespace the rest of the key/value pairs.

We are considering adding "key1" to be extracted at index time. I believe by doing so would speed our ability to focus on the events coming from a particular product and/or component out in the field.

Search possibilities...

key1="mycompany.product.*" ...blah...
key1="mycompany.product.component"  ...blah...
key1="*.component.*"  ...blah...
etc.

Opinions?

0 Karma

Splunk Employee
Splunk Employee

Based on this post, it sounds like this may be one of the cases where it does makes sense:

http://splunk-base.splunk.com/answers/842/do-search-time-fields-have-performance-considerations?page...

0 Karma

Splunk Employee
Splunk Employee

Have you considered making key1 the sourcetype or the source? It is a safer solution and will still allow you to use metasearch and other fun indexed field tricks

I advise against the use of custom indexed fields, namely because it changes the structure of your index compared to your other indices and is not advised by the docs.