Getting Data In

Generating custom command with complete JSON field extraction

jonfrancais
Explorer

We are developing a generating custom command using the Splunk Python SDK. The issue we are having is that only those fields exported from the first 'yield' are extracted in future events (so only those fields appear as Extracted/Interesting Fields). Given these field names change continually, we don't want to provide a static list of potential field names.

To provide some background, the command runs on the search-head and obtains a list of JSON objects from a third-party. The JSON events returned will have differing fields in them. Once the JSON object has been returned, we append a field 'sourcetype' to the record in the hope it picks up our configuration from PROPS.CONF. The data is not indexed and only acquired at search-time.

We are using PROPS.CONF to define our 'sourcetype' with the following configuration:

[mySourceType]
DATETIME_CONFIG = CURRENT
KV_MODE = json
AUTO_KV_JSON = true
category = Custom
pulldown_type = 1

The custom command builds a record with the JSON fields as distinct attributes and we also append a JSON.DUMP of the record into the _raw attribute (this seemed to be the only way it would appear as 'Syntax Highlighted').

By appending SPATH to the end of the custom command correctly extracts all the fields from each event. It seems like the PROPS.CONF is not picked up correctly as we expected the KV_MODE = json to do the same thing as SPATH. We would like this extraction to occur at search-time without the need to append SPATH.

Thanks for your help.

0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

This is your process as I understand it:

generating search command -> get json from external source -> rewrite _raw with json -> append sourcetype field -> display events

In this case, you will not have the sourcetype applied just by creating a sourcetype field.

Instead you will probably have better luck if you rewrite _raw to include what is known as the splunk header. I'm not certain this would work at all but here's what I imagine that it might look like:

***SPLUNK*** host=... sourcetype=... source=... \n\r
dataDATAdataDATA

Usually it requires running this back through the data pipe though so again I'm not certain this will work. You're probably stuck with spath unless you want to write the logic in your command or ingest the data.

You might find this answer more helpful because I spent more time on it:
https://answers.splunk.com/answers/404224/how-do-i-use-requireheader-to-override-indexing-se-1.html

Also here's some python examples of how you might acheive this within your command:

results,dummy,settings = splunk.Intersplunk.getOrganizedResults()
results.append({"fieldname":'dataDATAdata'})
splunk.Intersplunk.outputResults(results)

OR  

results,dummy,settings = splunk.Intersplunk.getOrganizedResults()
for result in results:
  result['FIELDNAME'] = "STRING"
splunk.Intersplunk.outputResults(results)

View solution in original post

jkat54
SplunkTrust
SplunkTrust

This is your process as I understand it:

generating search command -> get json from external source -> rewrite _raw with json -> append sourcetype field -> display events

In this case, you will not have the sourcetype applied just by creating a sourcetype field.

Instead you will probably have better luck if you rewrite _raw to include what is known as the splunk header. I'm not certain this would work at all but here's what I imagine that it might look like:

***SPLUNK*** host=... sourcetype=... source=... \n\r
dataDATAdataDATA

Usually it requires running this back through the data pipe though so again I'm not certain this will work. You're probably stuck with spath unless you want to write the logic in your command or ingest the data.

You might find this answer more helpful because I spent more time on it:
https://answers.splunk.com/answers/404224/how-do-i-use-requireheader-to-override-indexing-se-1.html

Also here's some python examples of how you might acheive this within your command:

results,dummy,settings = splunk.Intersplunk.getOrganizedResults()
results.append({"fieldname":'dataDATAdata'})
splunk.Intersplunk.outputResults(results)

OR  

results,dummy,settings = splunk.Intersplunk.getOrganizedResults()
for result in results:
  result['FIELDNAME'] = "STRING"
splunk.Intersplunk.outputResults(results)

svasavada_splun
Splunk Employee
Splunk Employee

This was really helpful, thanks a lot for that!
Can you please tell me how can I do it using Splunklib (SCP version 2)

Thanks again for your help!

0 Karma

ays7abt
New Member

Hi,

I just have unsterstood that you have sent your data as a csv with header and you have made a extra header _raw with the whole json appended. is that right?

For example:
x,y,z,_raw \n 1,2,3,{x: 1, y: 2, z: 3}

0 Karma

jonfrancais
Explorer

The process you described is exactly what we are looking to do. We tried the Splunk header approach, but as you expected, this wasn't picked up correctly.

In the end, we went with using something based on the InterSplunk outputResults function as you mentioned, which worked. From looking at the source code, the key part to this is building a unique set of fields across all records which are exported as a CSV header so they are all present for field extraction. We also append the JSON.dump of the record to the _raw section to enable the Syntax Highlighting (without it, this section would be empty - I'm still unsure why this is needed in addition to the record - but that's perhaps a different question!)

Thanks very much for your help.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...