Splunk Search

Is there a way to have Splunk recognize the nested JSON at index time?

aliquori
New Member

I have the following nested JSON logs:

{"statementData": {"overview": [{"value": 19.7780744265071, "dataCode": "rps"}, {"value": 2.82434121706085, "dataCode": "longTermDebtEquity"}, {"value": 0.450856524955893, "dataCode": "grossMargin"}, {"value": -0.262569832402235, "dataCode": "epsQoQ"}, {"value": 22.656256448508, "dataCode": "bvps"}, {"value": 0.471736371842336, "dataCode": "roe"}, {"value": 3.0, "dataCode": "piotroskiFScore"}, {"value": 0.450856524955893, "dataCode": "profitMargin"}, {"value": 0.0591282275797272, "dataCode": "roa"}, {"value": -0.0336046639533605, "dataCode": "revenueQoQ"}, {"value": 0.957170604577976, "dataCode": "currentRatio"}], "cashFlow": [{"value": 1634000000.0, "dataCode": "depamor"}, {"value": -403000000.0, "dataCode": "ncfx"}, {"value": -115000000.0, "dataCode": "ncff"}, {"value": 4476000000.0, "dataCode": "ncfo"}, {"value": 3057000000.0, "dataCode": "ncf"}, {"value": -178000000.0, "dataCode": "investmentsAcqDisposals"}, {"value": -902000000.0, "dataCode": "ncfi"}, {"value": -1440000000.0, "dataCode": "payDiv"}, {"value": 189000000.0, "dataCode": "sbcomp"}, {"value": 0.0, "dataCode": "issrepayEquity"}, {"value": 13000000.0, "dataCode": "businessAcqDisposals"}, {"value": -737000000.0, "dataCode": "capex"}, {"value": 1356000000.0, "dataCode": "issrepayDebt"}, {"value": 3739000000.0, "dataCode": "freeCashFlow"}], "incomeStatement": [{"value": 276000000.0, "dataCode": "opinc"}, {"value": 888000000.0, "dataCode": "shareswa"}, {"value": 1175000000.0, "dataCode": "netinc"}, {"value": 1.32, "dataCode": "eps"}, {"value": 895000000.0, "dataCode": "shareswaDil"}, {"value": 7646000000.0, "dataCode": "opex"}, {"value": 326000000.0, "dataCode": "intexp"}, {"value": 9649000000.0, "dataCode": "costRev"}, {"value": 1.31, "dataCode": "epsDil"}, {"value": 0.0, "dataCode": "prefDVDs"}, {"value": 1000000.0, "dataCode": "netIncDiscOps"}, {"value": 17571000000.0, "dataCode": "revenue"}, {"value": 5955000000.0, "dataCode": "sga"}, {"value": 1625000000.0, "dataCode": "rnd"}, {"value": -1226000000.0, "dataCode": "taxExp"}, {"value": 0.0, "dataCode": "nonControllingInterests"}, {"value": 1175000000.0, "dataCode": "consolidatedIncome"}, {"value": 275000000.0, "dataCode": "ebit"}, {"value": -51000000.0, "dataCode": "ebt"}, {"value": 1909000000.0, "dataCode": "ebitda"}, {"value": 7922000000.0, "dataCode": "grossProfit"}, {"value": 1175000000.0, "dataCode": "netIncComStock"}], "balanceSheet": [{"value": 14497000000.0, "dataCode": "ppeq"}, {"value": 72183000000.0, "dataCode": "intangibles"}, {"value": 19999000000.0, "dataCode": "equity"}, {"value": 38931000000.0, "dataCode": "assetsCurrent"}, {"value": 153403000000.0, "dataCode": "totalAssets"}, {"value": 12969000000.0, "dataCode": "debtCurrent"}, {"value": 114472000000.0, "dataCode": "assetsNonCurrent"}, {"value": 8782000000.0, "dataCode": "taxAssets"}, {"value": 4172000000.0, "dataCode": "acctPay"}, {"value": 133275000000.0, "dataCode": "totalLiabilities"}, {"value": 162626000000.0, "dataCode": "retainedEarnings"}, {"value": 2348000000.0, "dataCode": "taxLiabilities"}, {"value": 92602000000.0, "dataCode": "liabilitiesNonCurrent"}, {"value": 1786000000.0, "dataCode": "inventory"}, {"value": 69453000000.0, "dataCode": "debt"}, {"value": 647000000.0, "dataCode": "investmentsCurrent"}, {"value": 0.0, "dataCode": "deposits"}, {"value": 56484000000.0, "dataCode": "debtNonCurrent"}, {"value": 40673000000.0, "dataCode": "liabilitiesCurrent"}, {"value": 28377000000.0, "dataCode": "acctRec"}, {"value": 2558000000.0, "dataCode": "investments"}, {"value": 1911000000.0, "dataCode": "investmentsNonCurrent"}, {"value": 11370000000.0, "dataCode": "cashAndEq"}, {"value": -29283000000.0, "dataCode": "accoci"}, {"value": 888408023.0, "dataCode": "sharesBasic"}, {"value": 17146000000.0, "dataCode": "deferredRev"}]}, "quarter": 1, "year": 2020, "date": "2020-03-31"}, {"statementData" etc... 

My props.conf is as follows:

[testing]
LINE_BREAKER = (\{\"statementData\":\s+)
SHOULD_LINEMERGE = false
TIME_PREFIX = \{"date":"
TIME_FORMAT = %Y-%m-%d
TRUNCATE = 80000
INDEXED_EXTRACTIONS = JSON
JSON_TRIM_BRACES_IN_ARRAY_NAMES = true
KV_MODE = none

This allows for a json format that splunk can break down but the fields become a bit mangled. To get down to the dataCode and values, I used the following SPL:

index=dev  | spath |  rename statementData.balanceSheet.dataCode as Data, statementData.balanceSheet.value as Value | eval x=mvzip(Data, Value) | mvexpand x | eval x = split(x,",") |eval Data=mvindex(x,0) | eval Value=mvindex(x,1) | table source, Data, Value

This splits the data into two columns but I am having difficulty associating the columns ie. I want rps to equal 19.7780744265071. Is there an easier way to do this or have Splunk recognize the nested json at index time? I could see INDEXED_EXTRACTIONS=json working, but I would need to remove the "value" and "dataCode" fields first which would be far more work.

0 Karma
1 Solution

to4kawa
Ultra Champion
[testing]
SHOULD_LINEMERGE=false
LINE_BREAKER=(,\s*){\"statementData
NO_BINARY_CHECK=true
TRUNCATE=0
TIME_PREFIX=date\":\s*\"
TIME_FORMAT=%F
INDEXED_EXTRACTIONS = none
KV_MODE = none
TRANSFORMS-kv = json_kv

LINE_BREAKER's capture is better this way.

transforms.conf

[json_kv]
REGEX = {\"value\": (\S+), \"dataCode\": \"(\w+)\"}
FORMAT = $2::$1
REPEAT_MATCH = true
WRITE_META = true

I'm sure you can handle it.
thank you @xpac and your method.

View solution in original post

0 Karma

to4kawa
Ultra Champion
[testing]
SHOULD_LINEMERGE=false
LINE_BREAKER=(,\s*){\"statementData
NO_BINARY_CHECK=true
TRUNCATE=0
TIME_PREFIX=date\":\s*\"
TIME_FORMAT=%F
INDEXED_EXTRACTIONS = none
KV_MODE = none
TRANSFORMS-kv = json_kv

LINE_BREAKER's capture is better this way.

transforms.conf

[json_kv]
REGEX = {\"value\": (\S+), \"dataCode\": \"(\w+)\"}
FORMAT = $2::$1
REPEAT_MATCH = true
WRITE_META = true

I'm sure you can handle it.
thank you @xpac and your method.

0 Karma

aliquori
New Member

Thanks, cleaned it up a little bit but the issue still remains where the dataCode and value have no association.

0 Karma

to4kawa
Ultra Champion

@aliquori
my answer is updated.

0 Karma

aliquori
New Member

This worked! Thanks for the help, also worth nothing that when INDEXED_EXTRACTIONS is set to none it will ignore the monitored path. Once that was removed it was good to go.

0 Karma

to4kawa
Ultra Champion

I thought it was superfluous, so I set it with none.
You'll need to set it up if you need to.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...