Getting Data In

Generating custom command with complete JSON field extraction

Explorer

We are developing a generating custom command using the Splunk Python SDK. The issue we are having is that only those fields exported from the first 'yield' are extracted in future events (so only those fields appear as Extracted/Interesting Fields). Given these field names change continually, we don't want to provide a static list of potential field names.

To provide some background, the command runs on the search-head and obtains a list of JSON objects from a third-party. The JSON events returned will have differing fields in them. Once the JSON object has been returned, we append a field 'sourcetype' to the record in the hope it picks up our configuration from PROPS.CONF. The data is not indexed and only acquired at search-time.

We are using PROPS.CONF to define our 'sourcetype' with the following configuration:

[mySourceType]
DATETIME_CONFIG = CURRENT
KV_MODE = json
AUTO_KV_JSON = true
category = Custom
pulldown_type = 1

The custom command builds a record with the JSON fields as distinct attributes and we also append a JSON.DUMP of the record into the _raw attribute (this seemed to be the only way it would appear as 'Syntax Highlighted').

By appending SPATH to the end of the custom command correctly extracts all the fields from each event. It seems like the PROPS.CONF is not picked up correctly as we expected the KV_MODE = json to do the same thing as SPATH. We would like this extraction to occur at search-time without the need to append SPATH.

Thanks for your help.

0 Karma
1 Solution

SplunkTrust
SplunkTrust

This is your process as I understand it:

generating search command -> get json from external source -> rewrite _raw with json -> append sourcetype field -> display events

In this case, you will not have the sourcetype applied just by creating a sourcetype field.

Instead you will probably have better luck if you rewrite _raw to include what is known as the splunk header. I'm not certain this would work at all but here's what I imagine that it might look like:

***SPLUNK*** host=... sourcetype=... source=... \n\r
dataDATAdataDATA

Usually it requires running this back through the data pipe though so again I'm not certain this will work. You're probably stuck with spath unless you want to write the logic in your command or ingest the data.

You might find this answer more helpful because I spent more time on it:
https://answers.splunk.com/answers/404224/how-do-i-use-requireheader-to-override-indexing-se-1.html

Also here's some python examples of how you might acheive this within your command:

results,dummy,settings = splunk.Intersplunk.getOrganizedResults()
results.append({"fieldname":'dataDATAdata'})
splunk.Intersplunk.outputResults(results)

OR  

results,dummy,settings = splunk.Intersplunk.getOrganizedResults()
for result in results:
  result['FIELDNAME'] = "STRING"
splunk.Intersplunk.outputResults(results)

View solution in original post

SplunkTrust
SplunkTrust

This is your process as I understand it:

generating search command -> get json from external source -> rewrite _raw with json -> append sourcetype field -> display events

In this case, you will not have the sourcetype applied just by creating a sourcetype field.

Instead you will probably have better luck if you rewrite _raw to include what is known as the splunk header. I'm not certain this would work at all but here's what I imagine that it might look like:

***SPLUNK*** host=... sourcetype=... source=... \n\r
dataDATAdataDATA

Usually it requires running this back through the data pipe though so again I'm not certain this will work. You're probably stuck with spath unless you want to write the logic in your command or ingest the data.

You might find this answer more helpful because I spent more time on it:
https://answers.splunk.com/answers/404224/how-do-i-use-requireheader-to-override-indexing-se-1.html

Also here's some python examples of how you might acheive this within your command:

results,dummy,settings = splunk.Intersplunk.getOrganizedResults()
results.append({"fieldname":'dataDATAdata'})
splunk.Intersplunk.outputResults(results)

OR  

results,dummy,settings = splunk.Intersplunk.getOrganizedResults()
for result in results:
  result['FIELDNAME'] = "STRING"
splunk.Intersplunk.outputResults(results)

View solution in original post

Splunk Employee
Splunk Employee

This was really helpful, thanks a lot for that!
Can you please tell me how can I do it using Splunklib (SCP version 2)

Thanks again for your help!

0 Karma

New Member

Hi,

I just have unsterstood that you have sent your data as a csv with header and you have made a extra header _raw with the whole json appended. is that right?

For example:
x,y,z,_raw \n 1,2,3,{x: 1, y: 2, z: 3}

0 Karma

Explorer

The process you described is exactly what we are looking to do. We tried the Splunk header approach, but as you expected, this wasn't picked up correctly.

In the end, we went with using something based on the InterSplunk outputResults function as you mentioned, which worked. From looking at the source code, the key part to this is building a unique set of fields across all records which are exported as a CSV header so they are all present for field extraction. We also append the JSON.dump of the record to the _raw section to enable the Syntax Highlighting (without it, this section would be empty - I'm still unsure why this is needed in addition to the record - but that's perhaps a different question!)

Thanks very much for your help.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!