Splunk Search

JSON - an array, many fields, mvzip and mvexpand issue

JimboSlice
Path Finder

Hi, am I doing this correct or is there another way to tabulate this JSON?

I've seen many examples on the forums of people using mvexpand and mvzip to tabulate their JSON but this is working with just a few fields rather than a handful and not to any depths.

I have events pulled from the splunk add-on for microsoft cloud services.

As you may already know, this pulls body.records{}->[Many subitems in an array]... for individual events:

One (redacted) event in splunk may look like so:

{
body: {
records: [
{
backendPoolName: XXXX-pool
backendSettingName: XXXX-backend-setting
category: ApplicationGatewayAccessLog
listenerName: XXXX-ssl-listener
operationName: ApplicationGatewayAccess
properties: {
clientIP: 4.5.6.7
clientPort: 56522
host: xxxx
httpMethod: GET
httpStatus: 302
httpVersion: HTTP/1.1
instanceId: appgw_1
originalHost: XXXX
originalRequestUriWithArgs: /XXXX
receivedBytes: 386
requestQuery: client_id=XXXX
requestUri: /XXXX
sentBytes: 1404
serverResponseLatency: 0.012
serverRouted: XXXX
serverStatus: 302
sslCipher: ECDHE-RSA-AES256-GCM-SHA384
sslClientCertificateFingerprint:
sslClientCertificateIssuerName:
sslClientVerify: NONE
sslEnabled: on
sslProtocol: TLSv1.2
timeTaken: 0.013
transactionId: XXXX
userAgent: Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)
}
resourceId: /SUBSCRIPTIONS/XXXX/RESOURCEGROUPS/XXXX/PROVIDERS/MICROSOFT.NETWORK/APPLICATIONGATEWAYS/XXXX
ruleName: XXXX-ruleset
timeStamp: 2021-03-05T23:53:01+00:00
}
{
backendPoolName: XXXX-pool
backendSettingName: XXXX-backend-setting
category: ApplicationGatewayAccessLog
listenerName: XXXX-ssl-listener
operationName: ApplicationGatewayAccess
properties: {
clientIP: 1.2.3.4
clientPort: 52478
host: XXXX
httpMethod: GET
httpStatus: 200
httpVersion: HTTP/1.1
instanceId: appgw_1
originalHost: XXXX
originalRequestUriWithArgs: /session/XXXX
receivedBytes: 2087
requestQuery: XXXX
requestUri: /session/XXXX
sentBytes: 6081
serverResponseLatency: 0.024
serverRouted: XXXX
serverStatus: 200
sslCipher: ECDHE-RSA-AES256-GCM-SHA384
sslClientCertificateFingerprint:
sslClientCertificateIssuerName:
sslClientVerify: NONE
sslEnabled: on
sslProtocol: TLSv1.2
timeTaken: 0.028
transactionId: XXXX
userAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/87.0.4280.20 Safari/537.36
}
resourceId: /SUBSCRIPTIONS/XXXX/RESOURCEGROUPS/XXXX-PRGATEWAY/PROVIDERS/MICROSOFT.NETWORK/APPLICATIONGATEWAYS/GATEWAY-PR-AG
ruleName: XXXX-ruleset
timeStamp: 2021-03-05T23:53:03+00:00
}
]
}
x-opt-enqueued-time: 1614988459311
x-opt-offset: 29218690517200
x-opt-sequence-number: 19033991
}

First I had issues with empty values, so i have resolved this.

Some of these events are so big they wont render in the web gui, so its hard to tell what's going on, we have a plan to break them down using props/SEDCMD.

This is my query:

index=eventhub AND source="*C113*" AND body.records{}.resourceId="/subscriptions/XXXXX/resourcegroups/XXXXX-prgateway/providers/microsoft.network/applicationgateways/gateway-pr-ag"
| eval _raw = replace (_raw, "\":\"\",","\":\"N/A\",")

| spath

| rename body.records{}.timeStamp as timeStamp_
| rename body.records{}.properties.receivedBytes as receivedBytes_
| rename body.records{}.properties.clientIP as clientIP_
| rename body.records{}.properties.clientPort as clientPort_
| rename body.records{}.properties.host as host_
| rename body.records{}.properties.httpMethod as httpMethod_
| rename body.records{}.properties.httpStatus as httpStatus_
| rename body.records{}.properties.originalHost as originalHost_
| rename body.records{}.properties.originalRequestUriWithArgs as originalRequestUriWithArgs_
| rename body.records{}.properties.timeTaken as timeTaken_
| rename body.records{}.properties.userAgent as userAgent_

| eval x = mvzip(timeStamp_, clientIP_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval timeStamp2 = mvindex(x, 0)
| eval clientIP2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, receivedBytes_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval receivedBytes2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, clientPort_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval clientPort2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, host_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval host2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, httpMethod_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval httpMethod2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, httpStatus_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval httpStatus2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, originalHost_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval originalHost2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, originalRequestUriWithArgs_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval originalRequestUriWithArgs2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, timeTaken_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval timeTaken2 = mvindex(x, 1)

| eval x = mvzip(clientIP_, userAgent_, ";;")
| mvexpand x
| eval x = split(x, ";;")
| eval clientIP2 = mvindex(x, 0)
| eval userAgent2 = mvindex(x, 1)


| table timeStamp2, receivedBytes2, clientIP2, clientPort2, host2, httpMethod2, httpStatus2, originalHost2, originalRequestUriWithArgs2, timeTaken2, userAgent2
| dedup timeStamp2, receivedBytes2, clientIP2, clientPort2, host2, httpMethod2, httpStatus2, originalHost2, originalRequestUriWithArgs2, timeTaken2, userAgent2

 

I'm seeing exponentially more results in my table, the more fields I add.

limits.conf is being mentioned for result truncation with mvexpand being used.

I'm seeing user agents which, when I check an individual record manually, does not map, where an event has just one entry for a clientIP it's spawning many lines in my resulting table which appear to cover all combinations.

Any help on how to query this slightly advanced JSON dataset would be good, or perhaps point out where im going wrong? p.s. we have historical data we need to process, event breaking is on the horizon but we cant really get out of this one that easily...

 

 

Labels (1)
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults | eval _raw="{
  \"body\": {
    \"records\": [
      {
        \"properties\": {
          \"clientIP\": \"4.5.6.7\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-05T23:53:01+00:00\"
      }
    ]
  }
}|{
  \"body\": {
    \"records\": [
      {
        \"properties\": {
          \"clientIP\": \"7.6.5.4\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-04T23:53:01+00:00\"
      },
      {
        \"properties\": {
          \"clientIP\": \"7.6.5.3\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-04T22:53:01+00:00\"
      }
    ]
  }
}|{
  \"body\": {
    \"records\": [
      {
        \"properties\": {
          \"clientIP\": \"6.7.8.9\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-03T23:53:01+00:00\"
      },
      {
        \"properties\": {
          \"clientIP\": \"6.7.8.10\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-03T22:53:01+00:00\"
      },
      {
        \"properties\": {
          \"clientIP\": \"6.7.8.11\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-03T21:53:01+00:00\"
      }
    ]
  }
}"
| eval events=split(_raw,"|")
| mvexpand events
| eval _raw=events
| fields - _time events


| spath body.records{} output=records
| mvexpand records
| spath input=records timeStamp output=timeStamp
| spath input=records properties output=properties
| mvexpand properties
| spath input=properties
| fields - _raw properties records

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Rather than extracting everything at once, have you tried extracting  level by level e.g. extract the records array, mvexpand that, then extract the properties array from the record field and mvexpand that, etc.?

JimboSlice
Path Finder

Hi, in response to this and what i previously said about what i was mulling over, the below seems to work, but doesnt use spath, in any event here it is and im going to look at your solution which does use spath:

index=eventhub AND source="*C113*" AND body.records{}.resourceId="/subscriptions/xxxxx/resourcegroups/s141p01-prgateway/providers/microsoft.network/applicationgateways/gateway-pr-ag"
| eval _raw = replace (_raw, "\":\"\",","\":\"N/A\",")
| rex max_match=100 (?sU)(?<records>\{\s\"timeStamp\"\:.+\}\})
| mvexpand records
| rename _raw as temp
| rename records as _raw
| extract
| rename temp as _raw
| table timeStamp, backendPoolName, backendSettingName, ruleName, properties.receivedBytes, properties.clientIP, properties.clientPort, properties.host, properties.httpMethod, properties.httpStatus, properties.originalHost, properties.originalRequestUriWithArgs, properties.timeTaken, properties.userAgent

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You might want to consider max_match=0 in case there are more than 100 records in an event

JimboSlice
Path Finder

Hi, how do i go about that?  I have not tried that yet, i'm not experienced enough to handle this sort of JSON.

I'm currently looking in to somehow creating a mvfield from the records array and handling the elements of that field individually using spath (e.g. effectively break an event to many events through search), for the outer items like timeStamp i will probably devise some method to append the value, its very difficult is this.

The solution link I posted earlier (https://community.splunk.com/t5/Splunk-Search/How-to-parse-my-JSON-data-with-spath-and-table-the-dat...) doesn't work properly in 7.3.3 or i would try use that approach, many mvzips and one or two expands followed by tabulation, i've done this before for AWS logs some years ago with success, but i don't recall them being in such a state as this.

Tags (2)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| makeresults | eval _raw="{
  \"body\": {
    \"records\": [
      {
        \"properties\": {
          \"clientIP\": \"4.5.6.7\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-05T23:53:01+00:00\"
      }
    ]
  }
}|{
  \"body\": {
    \"records\": [
      {
        \"properties\": {
          \"clientIP\": \"7.6.5.4\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-04T23:53:01+00:00\"
      },
      {
        \"properties\": {
          \"clientIP\": \"7.6.5.3\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-04T22:53:01+00:00\"
      }
    ]
  }
}|{
  \"body\": {
    \"records\": [
      {
        \"properties\": {
          \"clientIP\": \"6.7.8.9\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-03T23:53:01+00:00\"
      },
      {
        \"properties\": {
          \"clientIP\": \"6.7.8.10\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-03T22:53:01+00:00\"
      },
      {
        \"properties\": {
          \"clientIP\": \"6.7.8.11\",
          \"clientPort\": 56522
        },
        \"timeStamp\": \"2021-03-03T21:53:01+00:00\"
      }
    ]
  }
}"
| eval events=split(_raw,"|")
| mvexpand events
| eval _raw=events
| fields - _time events


| spath body.records{} output=records
| mvexpand records
| spath input=records timeStamp output=timeStamp
| spath input=records properties output=properties
| mvexpand properties
| spath input=properties
| fields - _raw properties records

View solution in original post

JimboSlice
Path Finder

It was easy to just add the table command underneath after all the spath stuff, tried for a single item in splunk and it broke it down correctly in to the respectable lines.

I think this is the best and only mvexand and spath example on the forums that is truly end to end and works.

Thanks!

Tags (1)

JimboSlice
Path Finder

Wow, that works nice.  Let me adapt it for this use case and re-post the complete search that works and for everyone to see!

I was just extracting the JSON using rex to then come up with some way to join the two and query as single item..

index=eventhub AND source="*C113*" AND body.records{}.resourceId="/subscriptions/a8736e42-ebd4-4459-a73c-54b2137562ba/resourcegroups/s141p01-prgateway/providers/microsoft.network/applicationgateways/gateway-pr-ag"
| eval _raw = replace (_raw, "\":\"\",","\":\"N/A\",")

| rex max_match=100 (?sU)(?<records>\{\s\"timeStamp\"\:.+\}\})
| rex max_match=100 field=records (?sU)\"properties\"\:\s(?<properties>\{.+\"\})\}
| table records, properties

 

Thanks.

JimboSlice
Path Finder
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!