Getting Data In

Splitting header and each entry from a nested JSON array into separate events at index time

madstop99
Explorer

I have a JSON (all in one line when fed into Splunk):

{
    "customerName": "Patrick",
    "customerId": "123456",
    "customerCity": "New York",
    "host": "host1",
    "path": "/store/key",
    "sourceType": "purchase",
    "sourceName": "Store",
    "data": [{
        "store": "Store 23",
        "time": "2016/05/06 10:20:20",
        "spending": "$100-$200",
        "category": ["Grocery", "Toys"]
    }, {
        "store": "Store 40",
        "time": "2016/05/20 12:20:30",
        "spending": "$25-$50",
        "category": ["Cloths"]
    }]
}

I want to generate two events at index time, with a result like this:

Event 1: 
{
    "customerName": "Patrick",
    "customerId": "123456",
    "customerCity": "New York",
    "host": "host1",
    "path": "/store/key",
    "sourceType": "purchase",
    "sourceName": "Store",
        "store": "Store 23",
        "time": "2016/05/06 10:20:20",
        "spending": "$100-$200",
        "category": ["Grocery", "Toys"]
}

Event 2:
{
    "customerName": "Patrick",
    "customerId": "123456",
    "customerCity": "New York",
    "host": "host1",
    "path": "/store/key",
    "sourceType": "purchase",
    "sourceName": "Store",
        "store": "Store 40",
        "time": "2016/05/20 12:20:30",
        "spending": "$25-$50",
        "category": ["Cloths"]
}

Question 1:
I tried to do this with transforms.conf and props.conf, and couldn't get it to work. Any thought or suggestion?

Question 2:
I am expecting up to a few thousand entries in "data". Given that the timestamp is within each entry of JSON array, is this something I should do at Index time or search time?

Tags (1)

wjk5828
New Member

Did you find a solution on this problem? I have a similar non-JSON data set where I would just prefer to "flatten" the tree while indexing.

0 Karma

rshoward
Path Finder

This can be done at search time with some spath after mv expansion on data , but you may consider writing function to unroll the "data" array on ingest if you need the individual events broken out. It really depends on how the often the data will be ad-hoc searched.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Request for Professional Development: Attending .conf26

Winning Over the Boss: Your Pass to .conf26 conf26 is going to be here before you know it. If don't already ...

Casting Call: Compete in Cyber Games

Lights, Camera, SecOps: Apply to Compete in Cyber Games     Think you have what it takes to beat the clock? ...