Splunk Search

Is there a way to have Splunk recognize the nested JSON at index time?

aliquori
New Member

I have the following nested JSON logs:

{"statementData": {"overview": [{"value": 19.7780744265071, "dataCode": "rps"}, {"value": 2.82434121706085, "dataCode": "longTermDebtEquity"}, {"value": 0.450856524955893, "dataCode": "grossMargin"}, {"value": -0.262569832402235, "dataCode": "epsQoQ"}, {"value": 22.656256448508, "dataCode": "bvps"}, {"value": 0.471736371842336, "dataCode": "roe"}, {"value": 3.0, "dataCode": "piotroskiFScore"}, {"value": 0.450856524955893, "dataCode": "profitMargin"}, {"value": 0.0591282275797272, "dataCode": "roa"}, {"value": -0.0336046639533605, "dataCode": "revenueQoQ"}, {"value": 0.957170604577976, "dataCode": "currentRatio"}], "cashFlow": [{"value": 1634000000.0, "dataCode": "depamor"}, {"value": -403000000.0, "dataCode": "ncfx"}, {"value": -115000000.0, "dataCode": "ncff"}, {"value": 4476000000.0, "dataCode": "ncfo"}, {"value": 3057000000.0, "dataCode": "ncf"}, {"value": -178000000.0, "dataCode": "investmentsAcqDisposals"}, {"value": -902000000.0, "dataCode": "ncfi"}, {"value": -1440000000.0, "dataCode": "payDiv"}, {"value": 189000000.0, "dataCode": "sbcomp"}, {"value": 0.0, "dataCode": "issrepayEquity"}, {"value": 13000000.0, "dataCode": "businessAcqDisposals"}, {"value": -737000000.0, "dataCode": "capex"}, {"value": 1356000000.0, "dataCode": "issrepayDebt"}, {"value": 3739000000.0, "dataCode": "freeCashFlow"}], "incomeStatement": [{"value": 276000000.0, "dataCode": "opinc"}, {"value": 888000000.0, "dataCode": "shareswa"}, {"value": 1175000000.0, "dataCode": "netinc"}, {"value": 1.32, "dataCode": "eps"}, {"value": 895000000.0, "dataCode": "shareswaDil"}, {"value": 7646000000.0, "dataCode": "opex"}, {"value": 326000000.0, "dataCode": "intexp"}, {"value": 9649000000.0, "dataCode": "costRev"}, {"value": 1.31, "dataCode": "epsDil"}, {"value": 0.0, "dataCode": "prefDVDs"}, {"value": 1000000.0, "dataCode": "netIncDiscOps"}, {"value": 17571000000.0, "dataCode": "revenue"}, {"value": 5955000000.0, "dataCode": "sga"}, {"value": 1625000000.0, "dataCode": "rnd"}, {"value": -1226000000.0, "dataCode": "taxExp"}, {"value": 0.0, "dataCode": "nonControllingInterests"}, {"value": 1175000000.0, "dataCode": "consolidatedIncome"}, {"value": 275000000.0, "dataCode": "ebit"}, {"value": -51000000.0, "dataCode": "ebt"}, {"value": 1909000000.0, "dataCode": "ebitda"}, {"value": 7922000000.0, "dataCode": "grossProfit"}, {"value": 1175000000.0, "dataCode": "netIncComStock"}], "balanceSheet": [{"value": 14497000000.0, "dataCode": "ppeq"}, {"value": 72183000000.0, "dataCode": "intangibles"}, {"value": 19999000000.0, "dataCode": "equity"}, {"value": 38931000000.0, "dataCode": "assetsCurrent"}, {"value": 153403000000.0, "dataCode": "totalAssets"}, {"value": 12969000000.0, "dataCode": "debtCurrent"}, {"value": 114472000000.0, "dataCode": "assetsNonCurrent"}, {"value": 8782000000.0, "dataCode": "taxAssets"}, {"value": 4172000000.0, "dataCode": "acctPay"}, {"value": 133275000000.0, "dataCode": "totalLiabilities"}, {"value": 162626000000.0, "dataCode": "retainedEarnings"}, {"value": 2348000000.0, "dataCode": "taxLiabilities"}, {"value": 92602000000.0, "dataCode": "liabilitiesNonCurrent"}, {"value": 1786000000.0, "dataCode": "inventory"}, {"value": 69453000000.0, "dataCode": "debt"}, {"value": 647000000.0, "dataCode": "investmentsCurrent"}, {"value": 0.0, "dataCode": "deposits"}, {"value": 56484000000.0, "dataCode": "debtNonCurrent"}, {"value": 40673000000.0, "dataCode": "liabilitiesCurrent"}, {"value": 28377000000.0, "dataCode": "acctRec"}, {"value": 2558000000.0, "dataCode": "investments"}, {"value": 1911000000.0, "dataCode": "investmentsNonCurrent"}, {"value": 11370000000.0, "dataCode": "cashAndEq"}, {"value": -29283000000.0, "dataCode": "accoci"}, {"value": 888408023.0, "dataCode": "sharesBasic"}, {"value": 17146000000.0, "dataCode": "deferredRev"}]}, "quarter": 1, "year": 2020, "date": "2020-03-31"}, {"statementData" etc... 

My props.conf is as follows:

[testing]
LINE_BREAKER = (\{\"statementData\":\s+)
SHOULD_LINEMERGE = false
TIME_PREFIX = \{"date":"
TIME_FORMAT = %Y-%m-%d
TRUNCATE = 80000
INDEXED_EXTRACTIONS = JSON
JSON_TRIM_BRACES_IN_ARRAY_NAMES = true
KV_MODE = none

This allows for a json format that splunk can break down but the fields become a bit mangled. To get down to the dataCode and values, I used the following SPL:

index=dev  | spath |  rename statementData.balanceSheet.dataCode as Data, statementData.balanceSheet.value as Value | eval x=mvzip(Data, Value) | mvexpand x | eval x = split(x,",") |eval Data=mvindex(x,0) | eval Value=mvindex(x,1) | table source, Data, Value

This splits the data into two columns but I am having difficulty associating the columns ie. I want rps to equal 19.7780744265071. Is there an easier way to do this or have Splunk recognize the nested json at index time? I could see INDEXED_EXTRACTIONS=json working, but I would need to remove the "value" and "dataCode" fields first which would be far more work.

0 Karma
1 Solution

to4kawa
Ultra Champion
[testing]
SHOULD_LINEMERGE=false
LINE_BREAKER=(,\s*){\"statementData
NO_BINARY_CHECK=true
TRUNCATE=0
TIME_PREFIX=date\":\s*\"
TIME_FORMAT=%F
INDEXED_EXTRACTIONS = none
KV_MODE = none
TRANSFORMS-kv = json_kv

LINE_BREAKER's capture is better this way.

transforms.conf

[json_kv]
REGEX = {\"value\": (\S+), \"dataCode\": \"(\w+)\"}
FORMAT = $2::$1
REPEAT_MATCH = true
WRITE_META = true

I'm sure you can handle it.
thank you @xpac and your method.

View solution in original post

0 Karma

to4kawa
Ultra Champion
[testing]
SHOULD_LINEMERGE=false
LINE_BREAKER=(,\s*){\"statementData
NO_BINARY_CHECK=true
TRUNCATE=0
TIME_PREFIX=date\":\s*\"
TIME_FORMAT=%F
INDEXED_EXTRACTIONS = none
KV_MODE = none
TRANSFORMS-kv = json_kv

LINE_BREAKER's capture is better this way.

transforms.conf

[json_kv]
REGEX = {\"value\": (\S+), \"dataCode\": \"(\w+)\"}
FORMAT = $2::$1
REPEAT_MATCH = true
WRITE_META = true

I'm sure you can handle it.
thank you @xpac and your method.

0 Karma

aliquori
New Member

Thanks, cleaned it up a little bit but the issue still remains where the dataCode and value have no association.

0 Karma

to4kawa
Ultra Champion

@aliquori
my answer is updated.

0 Karma

aliquori
New Member

This worked! Thanks for the help, also worth nothing that when INDEXED_EXTRACTIONS is set to none it will ignore the monitored path. Once that was removed it was good to go.

0 Karma

to4kawa
Ultra Champion

I thought it was superfluous, so I set it with none.
You'll need to set it up if you need to.

0 Karma
Get Updates on the Splunk Community!

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...