Splunk Search

Is there a way to have Splunk recognize the nested JSON at index time?

aliquori
New Member

I have the following nested JSON logs:

{"statementData": {"overview": [{"value": 19.7780744265071, "dataCode": "rps"}, {"value": 2.82434121706085, "dataCode": "longTermDebtEquity"}, {"value": 0.450856524955893, "dataCode": "grossMargin"}, {"value": -0.262569832402235, "dataCode": "epsQoQ"}, {"value": 22.656256448508, "dataCode": "bvps"}, {"value": 0.471736371842336, "dataCode": "roe"}, {"value": 3.0, "dataCode": "piotroskiFScore"}, {"value": 0.450856524955893, "dataCode": "profitMargin"}, {"value": 0.0591282275797272, "dataCode": "roa"}, {"value": -0.0336046639533605, "dataCode": "revenueQoQ"}, {"value": 0.957170604577976, "dataCode": "currentRatio"}], "cashFlow": [{"value": 1634000000.0, "dataCode": "depamor"}, {"value": -403000000.0, "dataCode": "ncfx"}, {"value": -115000000.0, "dataCode": "ncff"}, {"value": 4476000000.0, "dataCode": "ncfo"}, {"value": 3057000000.0, "dataCode": "ncf"}, {"value": -178000000.0, "dataCode": "investmentsAcqDisposals"}, {"value": -902000000.0, "dataCode": "ncfi"}, {"value": -1440000000.0, "dataCode": "payDiv"}, {"value": 189000000.0, "dataCode": "sbcomp"}, {"value": 0.0, "dataCode": "issrepayEquity"}, {"value": 13000000.0, "dataCode": "businessAcqDisposals"}, {"value": -737000000.0, "dataCode": "capex"}, {"value": 1356000000.0, "dataCode": "issrepayDebt"}, {"value": 3739000000.0, "dataCode": "freeCashFlow"}], "incomeStatement": [{"value": 276000000.0, "dataCode": "opinc"}, {"value": 888000000.0, "dataCode": "shareswa"}, {"value": 1175000000.0, "dataCode": "netinc"}, {"value": 1.32, "dataCode": "eps"}, {"value": 895000000.0, "dataCode": "shareswaDil"}, {"value": 7646000000.0, "dataCode": "opex"}, {"value": 326000000.0, "dataCode": "intexp"}, {"value": 9649000000.0, "dataCode": "costRev"}, {"value": 1.31, "dataCode": "epsDil"}, {"value": 0.0, "dataCode": "prefDVDs"}, {"value": 1000000.0, "dataCode": "netIncDiscOps"}, {"value": 17571000000.0, "dataCode": "revenue"}, {"value": 5955000000.0, "dataCode": "sga"}, {"value": 1625000000.0, "dataCode": "rnd"}, {"value": -1226000000.0, "dataCode": "taxExp"}, {"value": 0.0, "dataCode": "nonControllingInterests"}, {"value": 1175000000.0, "dataCode": "consolidatedIncome"}, {"value": 275000000.0, "dataCode": "ebit"}, {"value": -51000000.0, "dataCode": "ebt"}, {"value": 1909000000.0, "dataCode": "ebitda"}, {"value": 7922000000.0, "dataCode": "grossProfit"}, {"value": 1175000000.0, "dataCode": "netIncComStock"}], "balanceSheet": [{"value": 14497000000.0, "dataCode": "ppeq"}, {"value": 72183000000.0, "dataCode": "intangibles"}, {"value": 19999000000.0, "dataCode": "equity"}, {"value": 38931000000.0, "dataCode": "assetsCurrent"}, {"value": 153403000000.0, "dataCode": "totalAssets"}, {"value": 12969000000.0, "dataCode": "debtCurrent"}, {"value": 114472000000.0, "dataCode": "assetsNonCurrent"}, {"value": 8782000000.0, "dataCode": "taxAssets"}, {"value": 4172000000.0, "dataCode": "acctPay"}, {"value": 133275000000.0, "dataCode": "totalLiabilities"}, {"value": 162626000000.0, "dataCode": "retainedEarnings"}, {"value": 2348000000.0, "dataCode": "taxLiabilities"}, {"value": 92602000000.0, "dataCode": "liabilitiesNonCurrent"}, {"value": 1786000000.0, "dataCode": "inventory"}, {"value": 69453000000.0, "dataCode": "debt"}, {"value": 647000000.0, "dataCode": "investmentsCurrent"}, {"value": 0.0, "dataCode": "deposits"}, {"value": 56484000000.0, "dataCode": "debtNonCurrent"}, {"value": 40673000000.0, "dataCode": "liabilitiesCurrent"}, {"value": 28377000000.0, "dataCode": "acctRec"}, {"value": 2558000000.0, "dataCode": "investments"}, {"value": 1911000000.0, "dataCode": "investmentsNonCurrent"}, {"value": 11370000000.0, "dataCode": "cashAndEq"}, {"value": -29283000000.0, "dataCode": "accoci"}, {"value": 888408023.0, "dataCode": "sharesBasic"}, {"value": 17146000000.0, "dataCode": "deferredRev"}]}, "quarter": 1, "year": 2020, "date": "2020-03-31"}, {"statementData" etc... 

My props.conf is as follows:

[testing]
LINE_BREAKER = (\{\"statementData\":\s+)
SHOULD_LINEMERGE = false
TIME_PREFIX = \{"date":"
TIME_FORMAT = %Y-%m-%d
TRUNCATE = 80000
INDEXED_EXTRACTIONS = JSON
JSON_TRIM_BRACES_IN_ARRAY_NAMES = true
KV_MODE = none

This allows for a json format that splunk can break down but the fields become a bit mangled. To get down to the dataCode and values, I used the following SPL:

index=dev  | spath |  rename statementData.balanceSheet.dataCode as Data, statementData.balanceSheet.value as Value | eval x=mvzip(Data, Value) | mvexpand x | eval x = split(x,",") |eval Data=mvindex(x,0) | eval Value=mvindex(x,1) | table source, Data, Value

This splits the data into two columns but I am having difficulty associating the columns ie. I want rps to equal 19.7780744265071. Is there an easier way to do this or have Splunk recognize the nested json at index time? I could see INDEXED_EXTRACTIONS=json working, but I would need to remove the "value" and "dataCode" fields first which would be far more work.

0 Karma
1 Solution

to4kawa
Ultra Champion
[testing]
SHOULD_LINEMERGE=false
LINE_BREAKER=(,\s*){\"statementData
NO_BINARY_CHECK=true
TRUNCATE=0
TIME_PREFIX=date\":\s*\"
TIME_FORMAT=%F
INDEXED_EXTRACTIONS = none
KV_MODE = none
TRANSFORMS-kv = json_kv

LINE_BREAKER's capture is better this way.

transforms.conf

[json_kv]
REGEX = {\"value\": (\S+), \"dataCode\": \"(\w+)\"}
FORMAT = $2::$1
REPEAT_MATCH = true
WRITE_META = true

I'm sure you can handle it.
thank you @xpac and your method.

View solution in original post

0 Karma

to4kawa
Ultra Champion
[testing]
SHOULD_LINEMERGE=false
LINE_BREAKER=(,\s*){\"statementData
NO_BINARY_CHECK=true
TRUNCATE=0
TIME_PREFIX=date\":\s*\"
TIME_FORMAT=%F
INDEXED_EXTRACTIONS = none
KV_MODE = none
TRANSFORMS-kv = json_kv

LINE_BREAKER's capture is better this way.

transforms.conf

[json_kv]
REGEX = {\"value\": (\S+), \"dataCode\": \"(\w+)\"}
FORMAT = $2::$1
REPEAT_MATCH = true
WRITE_META = true

I'm sure you can handle it.
thank you @xpac and your method.

0 Karma

aliquori
New Member

Thanks, cleaned it up a little bit but the issue still remains where the dataCode and value have no association.

0 Karma

to4kawa
Ultra Champion

@aliquori
my answer is updated.

0 Karma

aliquori
New Member

This worked! Thanks for the help, also worth nothing that when INDEXED_EXTRACTIONS is set to none it will ignore the monitored path. Once that was removed it was good to go.

0 Karma

to4kawa
Ultra Champion

I thought it was superfluous, so I set it with none.
You'll need to set it up if you need to.

0 Karma
Get Updates on the Splunk Community!

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...