Getting Data In
Highlighted

What is the best approach for a search time extraction of JSON formatted data within syslog event value?

New Member

I have a data source I am pulling syslog data from (a modular input). The data returned from this API is syslog formatted, however one of the fields within the syslog data contains a JSON formatted object within it. I am wondering what the best approach would be to start enabling default search-time extraction of the data held within this field:

<38>1 2017-01-13T17:31:46Z - VENDORHERE - EVENTYPEHERE [USERHERE@PIDHERE ... threatsInfoMap="[{\"threatID\":\"...\", \"threatType\":\"...\", \"classification\":\"...\", \"threatUrl\":\"...\", \"threatTime\":\"...\", \"threat\":\"...\", \"campaignID\":\"...\"},{\"threatID\":\"...\", \"threatType\":\"...\", \"classification\":\"...\", \"threatUrl\":\"...\", \"threatTime\":\"...\", \"threat\":\"...\", \"campaignID\":\"...\"}]" ...]

The ...'s are obviously redaction's of the data. The 'threatsInfoMap' field is the field containing JSON formatted data within the syslog data however. It is basically an array, that can contain no/single/many individual threats. Expanded out (and the escaped quote marks removed):

[
  {
    "threatID":"...",
    "threatType":"...",
    "classification":"...",
    "threatUrl":"...",
    "threatTime":"...",
    "threat":"...",
    "campaignID":"..."
  },
  {
    "threatID":"...",
    "threatType":"...",
    "classification":"...",
    "threatUrl":"...",
    "threatTime":"...",
    "threat":"...",
    "campaignID":"..."
  }
]

I would like to make these fields default searchable in the TA (add-on / app) that I am developing, however I am finding it difficult to be able to extract multiple values for the same field name using regexes. Can someone please point me in the right direction or suggestions so that I should begin to explore adding this type of search time extraction and ingestion of these values?

0 Karma
Highlighted

Re: What is the best approach for a search time extraction of JSON formatted data within syslog event value?

SplunkTrust
SplunkTrust

If there are no multilevel jsons in your data (json arrays), then this might work

Search Head, props.conf

[yoursourcetype]
REPORT-extractjson = extract_json_kv

Two options for Search Head, transforms.conf

#1) 
[extract_json_kv]
DELIMS = ",", ":"
MV_ADD = true

#2)
[extract_json_kv]
REGEX = \"([^\"]+)\":\"([^\"]+)\" 
FORMAT = $1::$2
MV_ADD = true

View solution in original post

Highlighted

Re: What is the best approach for a search time extraction of JSON formatted data within syslog event value?

New Member

I think I see the issue here, and it's slightly unrelated to the question asked. The answer provided is definitely the best answer and was the path I was running down before to no avail before. Good answer though, and the following document pages were a great read as well:

http://docs.splunk.com/Documentation/Splunk/6.5.1/Knowledge/Managefieldtransforms
http://docs.splunk.com/Documentation/Splunk/6.5.1/Knowledge/Exampleconfigurationsusingfieldtransform...

So, the real issue here is that it appears that this must be added to the props/transforms conf files located under the users search app's local directory:

/opt/splunk/etc/users/admin/search/local/

The purpose of the extraction is having it exist inside the install-able TA and that the extracted fields would already exist for the enduser and not require the user to append them anywhere themselves. Adding directly to the TA's default directory's props/transforms conf files has no percieved effect when attempting to use the 'search' app in splunk:

/opt/splunk/etc/apps/TAvendorproduct/default/

Is this a limitation for bare-bones TA's when searching for data pulled through it's modular input, or is what I am attempting to do not the right way about going to implement this type of capability? The end result I'm searching for is being able to use the search app in splunk to search for extracted fields like this, but not having to add any custom user transforms/props configuration file edits. I'd like all of these customizations to reside inside of the bare-bones TA, so that it is as simple to install and seamless for the end user as possible to get it to ingest data, and search for the underlying fields within it.

0 Karma
Highlighted

Re: What is the best approach for a search time extraction of JSON formatted data within syslog event value?

SplunkTrust
SplunkTrust

You would need to setup the sharing permissions as global of the props.conf and transforms.conf entries in default.meta/local.meta file under your TA so that these field extractions are available in all apps, including the "Search" app.

[props/yoursourcetype/REPORT-extractjson]
export =system

[transforms/extract_json_kv]
export = system
0 Karma
Highlighted

Re: What is the best approach for a search time extraction of JSON formatted data within syslog event value?

New Member

Problem solved! Thank you for your help somesoni2! Working like a charm now

0 Karma