Getting Data In

How to deal with curly brackets in field names creating a data model?

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

I was working with JSON data.
(Example here: http://www.splunk.com/web_assets/hunk/Hunkdata.json.gz)

The data is stored in Hadoop HDFS (Download e.g. Hortonworks HDP Sandbox and trial version of Splunk Analytics for Hadoop).

Example event:
{"customer": {"city": "SACRAMENTO", "zip": "95819", "firstName": "ERWIN", "accountNumber": "900401544", "lastName": "HARRELL", "address": "831 Maverton Dr.", "phone": "5215464018", "state": "CA", "sex": "M", "age": "55"}, "timestamp": "2013-09-01T00:01:05", "servername": "dash.5.woc.com", "charactertype": "Curd Cobbler", "items": [{"category": "armor", "itemid": "DB-SG-G01", "price": 25.0, "description": "'Vegan Friendly Gloves'"}, {"category": "tools", "itemid": "AB-TR-N89", "price": 135.0, "description": "'Robotic Cow Milker'"}, {"category": "tools", "itemid": "AB-TR-N89", "price": 135.0, "description": "'Robotic Cow Milker'"}, {"category": "cheese", "itemid": "ST-RF-M04", "price": 20.0, "description": "Manchego"}, {"category": "tools", "itemid": "CU-PG-G06", "price": 65.0, "description": "'Cheese Board of Glory'"}], "total": 380.0, "type": "purchase", "region": "Limburgerland"}

INDEXED_EXTRACTIONS does not work (because it's "search time" if you deal with Hadoop).
You can use KV_MODE=JSON in your sourcetype definition.

Sample data includes array fields like items{}.category after auto extraction.

If you want to create a data model you'll get an error message if you choose to add "items{}.category" as an auto extracted field.

There is a "new" option called
JSON_TRIM_BRACES_IN_ARRAY_NAMES
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf#Structured_Data_Header_Extractio...

Unfortunately this option is index time only and does again not work with data stored in HDFS.
(But you can try to ingest the example data with Splunk Enterprise and it should work).
In addition this feature has some issues with SPATH compatibility:
"Note that enabling this will make json indextime extracted array fiels names
inconsistant with spath search processor's naming convention."

Long story shot:
Use FIELDALIAS to rename the field with curly brackets.
This is a search time option and will present the "working" field name in addition to the "non working" version if you click on "add field: Auto-Extracted".
Example:

[json:hunkorders]
    FIELDALIAS-items=items{}.category AS items.category,items{}.description AS items.description,items{}.itemid AS items.itemid,items{}.price AS items.price

(Yes, you can define multiple rename statements in one line).

You don't have to do it on the command line. Select "Settings/Source Types" and you are good to go.

Feel free to comment or answer this article if you have other or better ideas.

Greetings,

Holger

Labels (2)
1 Solution

hsesterhenn_spl
Splunk Employee
Splunk Employee

manikanta461
Explorer

what if sometimes, the items are not an array? How can we make it more generic?
eg: for one event items is an array, and for another event items is not an array, how can we create an alias that would work for these two cases.

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

answered 🙂

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...